George Mason University
  Department of Computer Science

CS 484 - Data Mining

Dr. Jessica Lin

Spring 2013

 

HOME


 News & Announcements
HW1 posted. Due date 3/4.
Project guidelines posted.
HW2 posted. Due date 4/8.
4/9: HW3 posted. Due dates are 4/15 and 4/22. Here are the datasets.
See the new final exam schedule below.

Course Description

Basic principles and methods for data analysis and knowledge discovery. Emphasizes developing basic skills for modeling and prediction, on one side, and performance evaluation, on the other. Topics include system design; data quality, preprocessing, and association; event classification; clustering; biometrics; business intelligence; and mining complex types of data.

Instructor:

Dr. Jessica Lin 

Office: Engineering Building 4419
Phone: 703-993-4693
Email: jessica [AT] cs [DOT] gmu [DOT] edu
Office Hours:  Thursday  2-4pm

TA

 Tanwistha Saha

 Office: Engineering Building 4456
 Office Hours: Monday 3-5pm
 Email: tsaha [AT] cs [DOT] gmu [DOT] edu
 

Classes

Monday/Wednesday
1:30-2:45pm
Innovation Hall 206

Course Outcomes

  • The ability to apply computing principles, probability and statistics relevant to the data mining discipline to analyze data.
  • A thorough understanding of model programming with data mining tools, algorithms for estimation, prediction, and pattern discovery.
  • The ability to analyze a problem, identifying and defining the computing requirements appropriate to its solution: data collection and preparation, functional requirements, selection of models and prediction algorithms, software, and performance evaluation.
  • The ability to understand performance metrics used in the data mining field to interpret the results of applying an algorithm or model, to compare methods and to reach conclusions about data.
  • The ability to communicate effectively to an audience the steps and results followed in solving a data mining problem (through a term project)

Prerequisites:

  Grade of C or better in CS 310 and STAT 344

Grading

Assignments: 15%
Class Participation: 5%
Project: 30%

Midterm: 20%
Final: 30%

Exams

Exams will be open-book, open-note. Prior arrangement needs to be made with the instructor if you cannot make it to the exam. Missed exams cannot be made up.

Honor Code Statement

Please be familiar with the GMU Honor Code. In addition, the CS department has its own Honor Code policies. Any deviation from this is considered an Honor Code violation. 

Textbooks

  Required: Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar

 Additional handouts and reading materials may be given in class.

Topics
 
Ch.1: Introduction
Ch.2: Data
Ch.4: Classification
Ch.5: Classification: Alternative Techniques
Ch.6: Association Analysis: Basic Concepts and Algorithms
Ch.7: Association Analysis: Advanced Concepts
Ch.8: Cluster Analysis: Basic Concepts and Algorithms
Ch.9: Cluster Analysis: Additional Issues and Algorithms
Ch.10: Anomaly Detection



 Tentative Schedule 
  
No Dates Topics Notes
1 1/21
1/23
No class
Introduction 1 (Ch. 1)

 
2 1/28
1/30
Introduction 2
Data 1
 
3 2/4
2/6
Data 2
Data 3
 
4 2/11
2/13
Data 4
Classification 1 & 2
 HW1 posted
 
5 2/18
2/20
Classification 3
Classification 4
 
6 2/25
2/27
Classification 5
Classification 6
 
 Project posted

7 3/4
3/6
Classification 7
Snow Day
 
 
8 3/11
3/13
Spring Break  
9 3/18
3/20
Clustering 1
Midterm
 HW1 & Project Proposal due (3/18)
 HW2 posted
10 3/25
3/27
Clustering 2
Post-midterm review/
Clustering 3
 
 
11 4/1
4/3
Clustering 4
Association Analysis 1
 
 
HW3 posted
12 4/8
4/10
Association Analysis 2
Association Analysis 2, con't
 HW2 due
 
13 4/15
4/17
Association Analysis 3
 HW3 Part 1 due
14 4/22
4/24
Anomaly Detection
Recommendation Systems

 HW3 Part 2 due

15 4/29
5/1
Review
Final Exam
 
16 5/6
5/8
No class
Presentations (1:30-4:15pm)