Professor Harry Wechsler
Department of Computer Science
e-mail : wechsler@cs.gmu.edu
web : http://cs.gmu.edu/~wechsler/
(703) 993-1533 (office)
(703) 993-1530 (sec)
(703)993-1710 (fax)
SPRING '2006
CS 499 Special Topics in Computer
Science - DATA MINING
Class Information
003 14364 M 4:30
p.m. –
7:10 p.m. T 107
Prerequisites
CS 450
(“databases”) or permission of
instructor
Office Hours
M 3:15 – 4:00
PM or by appointment (SITE II - Rm. 461)
Textbook
1. Data Mining: Concepts and Techniques, Han and Kamber,
Morgan Kaufmann, 2001 - web site for textbook
slides : http://www.cs.sfu.ca/~han/bk
1a. web site for data
mining software http://www.togaware.com/datamining/survivor/Weka.html
Course
Description
Basic principles and methods for data
analysis and knowledge discovery. Emphasis is on developing the basic
skills needed for modeling and prediction, on one side, and performance
evaluation, on the other side. Topics
include system design, data quality, data preprocessing and transformation,
data association, event classification, clustering, biometrics, business
intelligence, and mining complex types of data.
Motivation
The explosive growth in generating, collecting and
storing data has generated an urgent need for new techniques and automated
tools that can intelligently assist us in transforming the vast amounts of data
into useful information and knowledge. Data mining, a multidisciplinary field,
helps with the automated extraction of patterns representing knowledge
implicitly stored in large databases, data warehouses, and other massive
information repositories. The course
focuses on issues related to the feasibility, usefulness, efficiency, and
scalability of automated techniques for the discovery of patterns hidden in
large databases.
Grading
Homework à 15%
Midterm à March 20 à 20%
(Team) Term Project à 35%.
Final à Monday, May 15 à 30 %
Term Project
Students are working in teams
on the term project.
Scope and range for the project has to be agreed with the instructor.
Task involves meaningful functionality and significant amounts of data.
Project includes the following STEPS:
1. Problem definition,
requirements analysis and conceptual design.
2. Data selection / sampling.
3. Cleaning and integration / Preprocessing.
4. Data transformation / Data Reduction.
5. Data Mining.
6. Modeling, test & evaluation, and performance assessment.
7. Visualization and knowledge discovery.
Reviews and class presentations are conducted stepwise
throughout the course.
Final (In Class)
Project Presentation (SLIDES)
(about 30 minutes)
1. Survey / Literature Review
of (a) application
and (b) task / functionality, data mining (STEP 5)
and model selection (“training strategy”).
2. Brief
Description of STEPS 1 – 7.
3. Performance Evaluation and Assessment of your project.
Final Project Report (HARD COPY) (at
most 15 pages)
Submit Technical Report (TR) that covers
your Final Project
Presentation.
Tentative Schedule
|
January 23 |
|
|
January 30 |
|
|
February 6 |
|
|
February 13 |
|
|
February 20 - 27 |
|
|
March 6 |
Performance Assessment : Training (and Validation), Testing and Evaluation; STEP 4 |
|
March 13 |
Spring Break |
|
March 20 |
Mid Term Exam |
|
March 27 |
|
|
April 3 |
|
|
April 10 |
STEPS 6 - 7 |
|
April 17 |
Biometrics |
|
April 24 |
FINAL PROJECT
PRESENTATIONS |
|
May 1 |
FINAL PROJECT
PRESENTATIONS |