Professor Harry Wechsler

Department of Computer Science

George Mason University

Fairfax, VA 22030

e-mail : wechsler@cs.gmu.edu

web : http://cs.gmu.edu/~wechsler/

           (703) 993-1533 (office)

(703) 993-1530 (sec)

(703)993-1710 (fax)

 

GEORGE MASON UNIVERSITY

        SPRING   '2007

       CS 484 -- DATA MINING

       Class Information

001 14512 M   1:30 p.m.     4:15 p.m.  STII 15

Office Hours

M 12:30 – 1:15 PM or by appointment (SITE II - Rm. 461)

 

            Textbook

Data Mining: Concepts and Techniques (2nd. edition), Han  and  Kamber, Elsevier, 2006

web site for textbook slides  http://www-faculty.cs.uiuc.edu/~hanj/bk2/

 

References:

 

 WEKA web site for data mining software

 

http://www.togaware.com/datamining/survivor/Weka.html

  

UCI Machine Learning Repository Content Summary

 

http://www.ics.uci.edu/~mlearn/MLSummary.html

 

          Course Description

Basic principles and methods for data analysis and knowledge discovery. Emphasis is on developing the basic skills needed for modeling and prediction, on one side, and performance evaluation, on the other side.  Topics include system design, data quality, data preprocessing and transformation, data association, classification, clustering, biometrics, social networks and communities.

         Motivation

The explosive growth in generating, collecting and storing data has generated an urgent need for new techniques and automated tools that can intelligently assist us in transforming the vast amounts of data into useful information and knowledge. Data mining, a multidisciplinary field, helps with the automated extraction of regularities representing knowledge implicitly stored in large databases, data warehouses, and other massive information repositories.  The course focuses on issues related to the feasibility, usefulness, and efficiency of automated techniques for the discovery of patterns hidden in large databases.   

 

Schedule

 

1st day of classes: January 22, 2007

Spring Break: March 12, 2007

Mid Term: March 19, 2007

         Last Day of Classes: April 30, 2007

         Final:  May 14, 2007

Grading

Homework à 40 %

Midterm à March 19 à 20 %

Final à Monday, May 14 à 40 %

 

Tentative Schedule

January 22

Ch. 1: Introduction – Motivation and Functionalities; the Semantic Web

(see http://www.w3.org/2001/sw/) 

January 29 – February 5

Ch. 2: Data Preprocessing; Decision-Making and Pattern Recognition    

February 12

Ch. 3:  Data Warehouse and OLAP Technology 

February 19

            Ch. 4:  Data Cube Computation and Data

             Generalization; Performance Evaluation

 

February 26 - March 5

Ch. 5: Mining Frequent Patterns, Associations, and Correlations; Causality

 

-- REVIEW for MIDTERM --

March 12

Spring Break

March 19

Mid Term Exam

March 26 – April 2

Ch. 6: Classification and Prediction.

April 9 – April 16

Ch. 7: Cluster Analysis

April 23

Ch. 9.2: Social Network Analysis

April 30

Biometrics

 

-- REVIEW for FINAL --