George Mason University
  Department of Computer Science

CS 484 - Data Mining

Dr. Jessica Lin

Fall 2015

 

HOME


 News & Announcements
9/4: HW1 posted. Due date 9/15.
9/18: HW2 posted. Due date 9/24.
10/2: HW3 posted. Due date 10/15. Datasets are available here.
10/4: Project Guidelines posted. Proposal due date 10/27.
10/23: HW4 posted. Due date 11/12.
11/4: HW4 due date postponed to 11/17.
11/17: HW5 posted. Due date 11/24.
12/4: Final topics
12/5: HW5 solution posted on Blackboard.

Course Description

Basic principles and methods for data analysis and knowledge discovery. Emphasizes developing basic skills for modeling and prediction, on one side, and performance evaluation, on the other. Topics include system design; data quality, preprocessing, and association; event classification; clustering; biometrics; business intelligence; and mining complex types of data.

Instructor:

Dr. Jessica Lin 

Office: Engineering Building 4419
Phone: 703-993-4693
Email: jessica [AT] cs [DOT] gmu [DOT] edu
Office Hours:  Wednesday 2-3pm, Thursday 1:30-2:30pm

TA

 Jatin Mistry
 jmistry2 [AT] gmu [DOT] edu
 Office Hours: Monday/Tuesday 3-4pm
 Location: Engineering Building 5321

Classes

Tuesday/Thursday
12-1:15pm
Art & Design Building 2026

Course Outcomes

  • The ability to apply computing principles, probability and statistics relevant to the data mining discipline to analyze data.
  • A thorough understanding of model programming with data mining tools, algorithms for estimation, prediction, and pattern discovery.
  • The ability to analyze a problem, identifying and defining the computing requirements appropriate to its solution: data collection and preparation, functional requirements, selection of models and prediction algorithms, software, and performance evaluation.
  • The ability to understand performance metrics used in the data mining field to interpret the results of applying an algorithm or model, to compare methods and to reach conclusions about data.
  • The ability to communicate effectively to an audience the steps and results followed in solving a data mining problem (through a term project)

Prerequisites:

 Grade of C or better in CS 310 and STAT 344

Grading

Assignments: 20%
Project: 20%

Midterms: 30%
Final: 30%

Exams

There will be two midterm exams and a final exam covering lectures and readings (both will be in class, closed book). The final exam is comprehensive. Exams must be taken at the scheduled time and place, unless prior arrangement has been made with the instructor. Missed exams cannot be made up.

Honor Code Statement

The GMU Honor Code is in effect at all times. In addition, the CS Department has further honor code policies regarding programming projects, which are detailed here. Any deviation from the GMU or the CS department Honor Code is considered an Honor Code violation.

Textbooks

  Required: Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar

 Recommended: Data Mining and Analysis by Mohammed Zaki (Here is the online pdf version.)

Topics
 
Ch.1: Introduction
Ch.2: Data
Ch.4: Classification
Ch.5: Classification: Alternative Techniques
Ch.6: Association Analysis: Basic Concepts and Algorithms
Ch.7: Association Analysis: Advanced Concepts
Ch.8: Cluster Analysis: Basic Concepts and Algorithms
Ch.9: Cluster Analysis: Additional Issues and Algorithms
Ch.10: Anomaly Detection



 Tentative Schedule 
  
No Dates Topics Reading
Due
Notes
1 9/1
9/3
Introduction 1
Introduction 2
Ch. 1


 
 HW1 posted (due 9/15)

2 9/8
9/10
Data 1
Data 2
Ch. 2

 
3 9/15
9/17
Classification 1
Classification 2
Ch. 4
HW1 due


 HW2 posted (due 9/24)
4 9/22
9/24
Classification 3
Classification 4 / Exercise


HW2 due

5 9/29
10/1
Midterm 1
Classification 5 (updated 10/7)

  Ch. 4.5

 
 HW3 and project guideline posted
6 10/6
10/8
Post-midterm review / Classification 5
Classification 6

Ch. 5.2-5.7


 

7 10/13
10/15
No Class
Classification 7


HW3 due


8 10/20
10/22
Clustering 1
Clustering 2
Ch. 8.1
Ch. 8.2




HW4 posted
9 10/27
10/29
Clustering 3
Clustering 4
Ch. 8.3
Proposal
  

 
10 11/3
11/5
Midterm 2
Clustering 5

Ch. 8.4


 
 

11 11/10
11/12
Clustering 6
Association Analysis 1
Ch. 8.5
Ch. 6.1


 

12 11/17
11/19
Association Analysis 2
con't
Ch. 6.2-6.3
HW4 due  HW5 posted
 
13 11/24
11/26
Review
No class (Thanksgiving)

HW5 due
 
14 12/1
12/3
Association Analysis 3
Anomaly Detection / Review
Ch. 6.4, 5, 7
Ch. 10

 

15 12/8
12/10
Final Exam (note the new date!)
Project Presentations


 
16 12/15
12/17
No class
Presentations (10:30am - 1:15pm)


Project report