Professor Harry Wechsler

Department of Computer Science

George Mason University

Fairfax, VA 22030

e-mail :

web :

(703) 993-1533 (office)

(703) 993-1530 (sec)

(703)993-1710 (fax)



SPRING '2007


Class Information

001 14512 M 1:30 p.m. 4:15 p.m. STII 15

Office Hours

M 12:30 1:15 PM or by appointment (SITE II - Rm. 461)



Data Mining: Concepts and Techniques (2nd. edition), Han  and  Kamber, Elsevier, 2006

web site for textbook slides




 WEKA web site for data mining software


UCI Machine Learning Repository Content Summary


Course Description

Basic principles and methods for data analysis and knowledge discovery. Emphasis is on developing the basic skills needed for modeling and prediction, on one side, and performance evaluation, on the other side. Topics include system design, data quality, data preprocessing and transformation, data association, classification, clustering, biometrics, social networks and communities.


The explosive growth in generating, collecting and storing data has generated an urgent need for new techniques and automated tools that can intelligently assist us in transforming the vast amounts of data into useful information and knowledge. Data mining, a multidisciplinary field, helps with the automated extraction of regularities representing knowledge implicitly stored in large databases, data warehouses, and other massive information repositories. The course focuses on issues related to the feasibility, usefulness, and efficiency of automated techniques for the discovery of patterns hidden in large databases.




1st day of classes: January 22, 2007

Spring Break: March 12, 2007

Mid Term: March 19, 2007

Last Day of Classes: April 30, 2007

Final: May 14, 2007


Homework 40 %

Midterm March 19 20 %

Final Monday, May 14 40 %


Tentative Schedule

January 22

Ch. 1: Introduction Motivation and Functionalities; the Semantic Web


January 29 February 5

Ch. 2: Data Preprocessing; Decision-Making and Pattern Recognition

February 12

Ch. 3: Data Warehouse and OLAP Technology

February 19

Ch. 4: Data Cube Computation and Data

Generalization; Performance Evaluation


February 26 - March 5

Ch. 5: Mining Frequent Patterns, Associations, and Correlations; Causality



March 12

Spring Break

March 19

Mid Term Exam

March 26 April 2

Ch. 6: Classification and Prediction.

April 9 April 16

Ch. 7: Cluster Analysis

April 23

Ch. 9.2: Social Network Analysis

April 30



-- REVIEW for FINAL --