Professor Harry Wechsler
CS 750 Theory and Application of Data Mining (3)
A01 5/19 40482 MWF 3:45 p.m. – 6:50 p.m. SITE I 206
INFS 755 Data Warehousing and Mining (3)
A01 5/19 40483 MWF 3:45 p.m. – 6:50 p.m. SITE I 206
Students are expected to meet the prerequisites and their class
attendance vouches for this.
Office Hours
M-W-F 3:00 – 3:30 PM (SITE II - Rm. 461)
Textbook
Introduction to Data Mining, Tan, Steinbach and Kumar,
Pearson / Addison Wesley, 2006
web site for textbook slides
http://www-users.cs.umn.edu/~kumar/dmbook/
Reference
Data Mining: Concepts and Techniques (2nd. edition), Han and Kamber, Elsevier, 2006
web site for textbook slides
http://www-faculty.cs.uiuc.edu/~hanj/bk2/
WEKA web site for data mining software
http://www.togaware.com/datamining/survivor/Weka.html
UCI Machine Learning Repository Content Summary
http://www.ics.uci.edu/~mlearn/MLSummary.html
Additional References
1. V. Cherkassky and F. Mulier, Learning from Data : Concepts, Theory, and Methods (2nd. Ed.), John Wiley, 2007.
2. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2001.
Course Description
Concepts and techniques in data mining and multidisciplinary applications. Topics include databases and data warehousing, data cleaning and transformation;, concept description, association and correlation rules; data classification and predictive modeling; performance analysis and scalability, data mining in advanced database systems, including text, audio, and images; and emerging themes and trends. Term team project and topical review are required.
Motivation
The explosive growth in generating, collecting and storing data has generated an urgent need for new techniques and automated tools that can intelligently assist in transforming the vast amounts of data into useful information and knowledge. Data mining is a multidisciplinary field, drawing from areas including AI, database technology, data visualization, information retrieval, high performance computing, machine learning, mathematical programming, neural networks, pattern recognition, statistical learning theory, and statistics. The course provides the graduate students the opportunity to learn about the management and use of large data repositories based upon a multidisciplinary approach.
Goals
The objective of this course is to introduce graduate students to data mining basics, current research, technological advances and trends in data mining. Data mining, which supports knowledge discovery in databases (KDD), helps with the automated extraction of patterns representing knowledge implicitly stored in large databases, data warehouses, and other massive information repositories. The course focuses on issues related to the feasibility, usefulness, efficiency, and scalability of automated techniques for the discovery of patterns hidden in large databases. Students will be exposed to the above topics via lectures and reading assignments, including recent journal and conference papers. Students are expected to complete a term project and to make an in depth presentation on a topic related to data mining. As data mining has matured, the field is now advancing on three new fronts: (i) ability to mine data in real time; (ii) predictive analysis rather than merely explaining past trends; and (iii) ability to analyze messy “unstructured” data.
Follow – Up Studies with Professor Wechsler :
CS 668 -- Pattern Recognition (Fall 08)
CS 667 / 778 – Biometrics
:
PhD dissertation
Grading
3 (one hour) quizzes x 10% each (5/23 & 6/2 & 6/13) 30%
MIDTERM – June 6 30 %
Case Study 10%
FINAL -- (Team) Term Project 30 %
Term Project
Students are working in teams on the term project. Scope and range for the project has to be agreed with the instructor. Task involves meaningful functionality and significant amounts of data. Project includes the following STEPS:
1. Problem definition, requirements analysis and conceptual design. 2. Data selection / sampling // visualization // 3. Data cleaning and integration // visualization // 4. Preprocessing: Data transformation / Data Reduction // visualization // 5. Data Mining // visualization // 6. Modeling, testing & evaluation, performance assessment // visualization // 7. Knowledge discovery // visualization //
Use domain knowledge and visualization for all the steps.
Iteratively refine the quality and scope of your project
Reviews and class presentations are conducted stepwise throughout the course. First a draft for each step is expected the lecture the STEP is listed in the tentative schedule listed below. Based upon feedback received in class the same step is completed and presented again the following lecture.
Project Presentation (SLIDES) (about 45 minutes)
1. Survey / Literature Review of (a) application and (b) task / functionality, model selection (“training strategy”) and data mining (STEP 5).
2. Brief Description of STEPS 1 – 7.
3. Performance Evaluation and Assessment of your project.
Final Project Report (HARD COPY) (at most 10 pages)
Submit Technical Report (TR) that covers your Final Project Presentation.
Tentative Schedule
May 19 | Ch. 1: Introduction – Data Warehouses, Databases, Data Mining and Knowledge Discovery, and the Semantic Web (http://www.w3.org/2001/sw) - Appendix C – Probability and Statistics - |
May 21 | Ch. 2: Data STEP 1 - Appendix A – Linear Algebra - |
May 23 | Ch. 3: Exploring Data - Appendix E – Optimization - QUIZ #1 |
May 26 | Memorial Day – no class |
May 28 | Data reduction & transformation - Step 2& 3 Performance Evaluation Appendix B – Dimensionality Reduction |
May 30 | Ch. 4: Classification – Basics (Part I) Appendix D – Regression |
June 2 | Ch. 6: Associations – Basics (Part I) - Step 4 QUIZ#2 |
June 4 | Chap. 8: Clustering – Basics (Part I) REVIEW for Mid – Term |
June 6 | Mid – Term Closed books and notes bring blue book and calculator |
June 9 – 11 – 13 | Chaps. 4/5, 6/7, 8/9 - Advanced Topics – Classification – Association – Clustering Model Selection Ch. 10 – Anomaly Detection Biometrics STEP 5 – June 9 STEPS 6 – 7 – June 11 Quiz#3 (June 13) |
June 16 | FINAL PROJECT PRESENTATIONS |
June 18 | FINAL PROJECT PRESENTATIONS |