Professor Harry Wechsler

Department of Computer Science

George Mason University

Fairfax, VA 22030

e-mail : wechsler@cs.gmu.edu

web : http://cs.gmu.edu/~wechsler/

           (703) 993-1533 (office)

(703) 993-1530 (sec)

(703)993-1710 (fax)

 

GEORGE MASON UNIVERSITY

        SPRING  '2006

       CS 499 Special Topics in Computer Science - DATA MINING

       Class Information

003 14364 M   4:30 p.m.     7:10 p.m.  T 107

Prerequisites

CS 450 (“databases”) or   permission of instructor

Office Hours

M 3:15 – 4:00 PM or by appointment (SITE II - Rm. 461)

 

            Textbook

1. Data Mining: Concepts and Techniques, Han and  Kamber, Morgan Kaufmann, 2001 - web site for textbook  slides  : http://www.cs.sfu.ca/~han/bk

1a. web site for data mining software http://www.togaware.com/datamining/survivor/Weka.html

 

          Course Description

Basic principles and methods for data analysis and knowledge discovery. Emphasis is on developing the basic skills needed for modeling and prediction, on one side, and performance evaluation, on the other side.  Topics include system design, data quality, data preprocessing and transformation, data association, event classification, clustering, biometrics, business intelligence, and mining complex types of data.

         Motivation

The explosive growth in generating, collecting and storing data has generated an urgent need for new techniques and automated tools that can intelligently assist us in transforming the vast amounts of data into useful information and knowledge. Data mining, a multidisciplinary field, helps with the automated extraction of patterns representing knowledge implicitly stored in large databases, data warehouses, and other massive information repositories.  The course focuses on issues related to the feasibility, usefulness, efficiency, and scalability of automated techniques for the discovery of patterns hidden in large databases.   

Grading

Homework à 15%

Midterm à March 20 à 20%

(Team) Term Project à  35%.

Final à Monday, May 15 à 30 %

Term Project

Students are   working in teams on the term project.
Scope and range for the project has to be agreed with the instructor.
Task involves meaningful functionality and significant amounts of data.
Project includes the following   STEPS:


1. Problem definition, requirements analysis and conceptual design.
2. Data selection / sampling.
3. Cleaning and integration / Preprocessing.
4. Data transformation / Data Reduction.
5. Data Mining.
6. Modeling, test & evaluation, and performance assessment.
7. Visualization and knowledge discovery.

Reviews and class presentations are conducted stepwise
throughout the course. 

Final (In Class)  Project Presentation (SLIDES) (about 30 minutes)

1.  Survey / Literature Review of  (a) application
and (b) task / functionality, data mining (STEP 5)
and model selection (“training strategy”).

2.    Brief   Description of STEPS 1 – 7.

3.    Performance Evaluation and Assessment of your project.

Final Project Report (HARD COPY) (at most 15 pages)

         Submit Technical Report (TR) that covers your Final Project  Presentation.

 

Tentative Schedule

January 23

Ch. 1: Introduction – Data Warehouses, Databases, Data Mining and Knowledge Discovery, and the Semantic Web

(see http://www.w3.org/2001/sw/) 

January 30

Ch. 2: Data Warehouses and OLAP Technology.    

February 6

Ch. 3: Data Transformation and Preprocessing. STEP 1

February 13

Ch. 4:  System Architecture. Machine Learning. 

February 20 - 27

            Ch. 5:  Concept Description. STEPS 2 - 3

March 6

Performance Assessment : Training (and Validation), Testing and Evaluation;

Ch. 6: Mining Association Rules : A Priori Algorithm;

STEP 4

March 13

Spring Break

March 20

Mid Term Exam

March 27

Ch. 7: Classification and Prediction.

April 3

Ch. 8: Cluster Analysis. STEP 5

April 10

Ch. 9 Mining Complex Types of Data

STEPS  6 - 7

April 17

Biometrics

April 24

FINAL  PROJECT   PRESENTATIONS

May 1

FINAL  PROJECT   PRESENTATIONS