Professor Harry Wechsler

Department of Computer Science

George Mason University

Fairfax, VA 22030

e-mail : wechsler@cs.gmu.edu

web : http://cs.gmu.edu/~wechsler/

(703) 993-1533 (office)

(703) 993-1530 (sec)

(703)993-1710 (fax)

 

GEORGE MASON UNIVERSITY

SPRING '2007

CS 484 -- DATA MINING

Class Information

001 14512 M 1:30 p.m. 4:15 p.m. STII 15

Office Hours

M 12:30 1:15 PM or by appointment (SITE II - Rm. 461)

 

  Textbook

Data Mining: Concepts and Techniques (2nd. edition), Han  and  Kamber, Elsevier, 2006

web site for textbook slides  http://www-faculty.cs.uiuc.edu/~hanj/bk2/

 

References:

 

 WEKA web site for data mining software

 

http://www.togaware.com/datamining/survivor/Weka.html

  

UCI Machine Learning Repository Content Summary

 

http://www.ics.uci.edu/~mlearn/MLSummary.html

 

Course Description

Basic principles and methods for data analysis and knowledge discovery. Emphasis is on developing the basic skills needed for modeling and prediction, on one side, and performance evaluation, on the other side. Topics include system design, data quality, data preprocessing and transformation, data association, classification, clustering, biometrics, social networks and communities.

Motivation

The explosive growth in generating, collecting and storing data has generated an urgent need for new techniques and automated tools that can intelligently assist us in transforming the vast amounts of data into useful information and knowledge. Data mining, a multidisciplinary field, helps with the automated extraction of regularities representing knowledge implicitly stored in large databases, data warehouses, and other massive information repositories. The course focuses on issues related to the feasibility, usefulness, and efficiency of automated techniques for the discovery of patterns hidden in large databases.

 

Schedule

 

1st day of classes: January 22, 2007

Spring Break: March 12, 2007

Mid Term: March 19, 2007

Last Day of Classes: April 30, 2007

Final: May 14, 2007

Grading

Homework 40 %

Midterm March 19 20 %

Final Monday, May 14 40 %

 

Tentative Schedule

January 22

Ch. 1: Introduction Motivation and Functionalities; the Semantic Web

(see http://www.w3.org/2001/sw/)

January 29 February 5

Ch. 2: Data Preprocessing; Decision-Making and Pattern Recognition

February 12

Ch. 3: Data Warehouse and OLAP Technology

February 19

Ch. 4: Data Cube Computation and Data

Generalization; Performance Evaluation

 

February 26 - March 5

Ch. 5: Mining Frequent Patterns, Associations, and Correlations; Causality

 

-- REVIEW for MIDTERM --

March 12

Spring Break

March 19

Mid Term Exam

March 26 April 2

Ch. 6: Classification and Prediction.

April 9 April 16

Ch. 7: Cluster Analysis

April 23

Ch. 9.2: Social Network Analysis

April 30

Biometrics

 

-- REVIEW for FINAL --