Prerequisites: Grade of C or better in CS 310 and STAT 344. Prerequisite enforced by registration system (see http://catalog.gmu.edu/preview_course_nopop.php?catoid=17&coid=108697)
Instructor: Prof. Harry Wechsler firstname.lastname@example.org
Course Description – Basic principles and methods for data analysis and knowledge discovery. Emphasizes developing basic skills for modeling and prediction, on one side, and performance evaluation, on the other. Topics include system design; data quality, preprocessing, and association; event classification; clustering; biometrics; business intelligence; and mining complex types of data.
Course (ABET) Outcomes:
1. The ability to apply computing principles, probability and statistics relevant to the data mining discipline to analyze data.
2. A thorough understanding of model programming with data mining tools, algorithms for estimation, prediction, and pattern discovery.
3. The ability to analyze a problem, identifying and defining the computing requirements appropriate to its solution: data collection and preparation, functional requirements, selection of models and prediction algorithms, software, and performance evaluation.
4. The ability to understand performance metrics used in the data mining field to interpret the results of applying an algorithm or model, to compare methods and to reach conclusions about data.
5. The ability to communicate effectively to an audience the steps and results followed in solving a data mining problem (through a term project).
Time, Day, and Venue: TR – Tuesday/Thursday, 12:00 – 1:15 pm
– Innovation Hall 136
Office Hours: TR – Thursday, 1:30 – 2:15 pm or by appointment, ENGR 4448.
First day of classes: Tuesday, January 20
Spring break: no class on Tuesday, March 10 and Thursday, March 12
Last day of classes: Thursday, April 30
Final Exam: Thursday, May 7, 10:30 – 1:15 pm
Required Textbook: P. N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison Wesley, 2006. http://www-users.cs.umn.edu/~kumar/dmbook/index.php
Complementary Textbook 1: I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.), Morgan Kaufmann, 2011. http://www.cs.waikato.ac.nz/ml/weka/book.html
Complementary Textbook 2: T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning (2nd ed.), Springer, 2009. http://statweb.stanford.edu/~tibs/ElemStatLearn/
Complementary Textbook 3: A. Rajaraman, J. Leskovec, and J. D. Ullman, Mining of Massive Datasets (2nd ed.), Cambridge University Press,2014. http://infolab.stanford.edu/~ullman/mmds/book.pdf
Software and Data:
UCI Machine Learning Repository is a repository of databases and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. http://archive.ics.uci.edu/ml/
Kaggle is the home of data science and data mining competitions. http://www.kaggle.com/
Resources: Software and Data. http://www-users.cs.umn.edu/~kumar/dmbook/resources.htm
MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. http://mallet.cs.umass.edu/
R – a programming language for statistical computing and graphics. http://en.wikipedia.org/wiki/R_%28programming_language%29 and http://www.r-project.org/
MATLAB and Toolboxes – The Language of Technical . http://www.mathworks.com/products/matlab/
CLOSED BOOK EXAMINATIONS
· Homework – 20%
· Class participation and Quizz(es) – 10%
· (Non-Cumulative) MidTerm1 and MidTerm2 – Thursday, February 27 & Thursday, March 26 – 20 %
· Term Project – April 28 and 30 – 30 %
· (Cumulative) Final – May 7 - 30 %
You are expected to abide by the GMU honor code. Homework assignments and exams are individual efforts. Information on the university honor code can be found at
Additional departmental CS information: http://cs.gmu.edu/wiki/pmwiki.php/HonorCode/CSHonorCodePolicies