Professor Harry Wechsler
Department of Computer Science
e-mail : wechsler@cs.gmu.edu
web : http://cs.gmu.edu/~wechsler/
(703) 993-1533 (office)
(703) 993-1530 (sec)
(703)993-1710 (fax)
FALL '2005
CS 750 Theory and Applications of Data
Mining
Class Information
001 70103 R 4:30 p.m.
– 7:10 p.m. ENT 274
Prerequisites
CS 450
(“databases”), CS 580 (“AI”) or permission of
instructor
Office Hours
Thursday 3:15
p.m. – 4:00 p.m. or by appointment (SITE II - Rm. 461)
Textbook
Introduction
to Data Mining, Tan, Steinbach and
Kumar,
Pearson Addison
Wesley, 2006
web site for textbook slides : http://www-users.cs.umn.edu/~kumar/dmbook/
Reference
Data Mining: Concepts and
Techniques, Han and Kamber, Morgan
Kaufmann, 2001
web site for textbook slides : http://www.cs.sfu.ca/~han/bk
WEKA web site for data mining software
http://www.togaware.com/datamining/survivor/Weka.html
Background
for Pattern Recognition and Classification
http://research.cs.tamu.edu/prism/lectures.htm
UCI
Machine Learning Repository Content Summary
http://www.ics.uci.edu/~mlearn/MLSummary.html
References
1. V. Cherkassky and F. Mulier, Learning
from Data : Concepts, Theory, and Methods, John Wiley,
1999.
2. D. Pyle, Data Preparation for Data
Mining, Morgan Kaufmann, 1999.
3. R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley,
1999.
4. T.
Hastie, R. Tibshirani, and
J. Friedman, The Elements of Statistical Learning :
Data Mining, Inference, and Prediction, Springer, 2001.
Course
Description
Concepts and techniques in data mining and their multidisciplinary
applications. Topics include data warehousing and databases, data cleaning and
transformation, pattern transformation and data compression, concept
description, association and correlation rules, data classification and
predictive modeling, clustering, performance analysis and scalability, data
mining in advanced database systems including text, audio and images, and
emerging themes and future challenges related to biometrics and the semantic
web. Term team project and topical
review are required.
Motivation
The explosive
growth in generating, collecting and storing data has generated an urgent need
for new techniques and automated tools that can intelligently assist us in
transforming the vast amounts of data into useful information and knowledge.
Data mining is a multidisciplinary field, drawing from areas including AI,
database technology, data visualization, information retrieval, high performance
computing, machine learning, mathematical programming, neural networks, pattern
recognition, statistical learning theory, and statistics. The course provides the graduate students the
opportunity to learn about the management and use of large data repositories
based upon a multidisciplinary approach.
Goals
The objective of this course is to introduce graduate students to
current research, technological advances and trends in data mining. Data mining, which supports knowledge discovery
in databases (KDD), helps with the automated extraction of patterns
representing knowledge implicitly stored in large databases, data warehouses,
and other massive information repositories.
The course focuses on issues related to the feasibility, usefulness,
efficiency, and scalability of automated techniques for the discovery of
patterns hidden in large databases.
Students will be exposed to the above topics via lectures and reading
assignments, including recent journal and conference papers. Students are
expected to complete a term project and to make an in depth presentation on a
topic related to data mining. As data mining has matured, the field is now
advancing on three new fronts: (i) ability to mine
data in real time; (ii) predictive analysis rather than merely explain past
trends; and (iii) analyze messy “unstructured” data.
Follow – Up Studies
with Professor Wechsler : 1. CS 667 –
Biometrics – Spring 2006; 2. CS 775
/ IT 844
-- Pattern Recognition – Spring 2007; 3. Certificate in Biometrics; 4.
PhD dissertation.
Grading
(Team) Term Project à 50 %.
Science and Technology REVIEW and Class Participation à 25%
Final Exam: December 15 à 25%
Term Project
Students are working in teams on the term project.
Scope and range for the project has to be agreed with the instructor.
Task involves meaningful functionality and significant amounts of data.
Project includes the
following STEPS :
1. Problem definition,
requirements analysis and conceptual design.
2. Data selection / sampling. // visualization //
3. Cleaning and integration / Preprocessing // visualization //
4. Data transformation / Data Reduction // visualization //
5. Data Mining // visualization //
6. Modeling, test & evaluation, and performance assessment // visualization
//
7. Knowledge discovery // visualization //
Use domain
knowledge and visualization for all the steps.
Iteratively refine
the quality and scope of your project
Reviews and class presentations are conducted stepwise
throughout the course (see tentative schedule). First a draft for each step is
expected
the week the STEP is listed in the tentative schedule listed below.
Based upon feedback received in class the same step is completed and
presented again the following week.
Final (In Class)
Project Presentation (SLIDES)
(about 30 minutes)
1. Survey / Literature Review
of (a) application
and (b) task / functionality, data mining (STEP 5)
and model selection (“training strategy”).
2. Brief
Description of STEPS 1 – 7.
3. Performance Evaluation and Assessment of your project.
Final Project Report (HARD COPY) (at
most 12 pages)
Submit Technical Report (TR) that
covers your Final
Project Presentation.
Tentative Schedule
|
September 1 |
- Appendix C – Probability and Statistics - |
|
September 8 |
- Appendix A – Linear Algebra - |
|
September 15 |
- Appendix E – Optimization - |
|
September 22 – September 29 |
Appendix D
–Regression - STEPS
2 – 3 <September 22> |
|
October 6 |
|
|
October 13 |
Feature extraction and selection & Data reduction Appendix B – Dimensionality Reduction |
|
October 20 |
STEP 4 |
|
October 27 |
|
|
November 3 - 10 |
STEP
5 <November 10> |
|
November 17 |
STEPS 6 - 7 |
|
November 17 |
Biometrics |
|
November 24 |
Thanksgiving |
|
December 1 |
FINAL PROJECT PRESENTATION |
|
December 8 |
FINAL PROJECT PRESENTATION - REVIEW for FINAL EXAM - |