CS 584 Fall 2015

Data Mining   CRN: 78055 CS 584 - 001

Prerequisites: Grade of C or better in CS 310 and STAT 344.

Instructor:  Prof. Harry Wechsler wechsler@gmu.edu

Email correspondence: from / to GMU accounts with subject: CS 584

Course Description Concepts and techniques in data mining and multidisciplinary applications. Topics include databases; data cleaning and transformation; concept description; association and correlation rules; data classification and predictive modeling; performance analysis and scalability; data mining in advanced database systems, including text, audio, and images; and emerging themes and future challenges.

Goals: Critical Thinking; Model Selection and Predictive Analytics Using Cross-Validation; Meaningful (size and scope) Data Mining Applications (to find novel and useful patterns); Experimental Design, Metrics and Performance Evaluation; Theory vs. Practice.

Time, Day, and Venue: W 7:20 pm 10:00 pm

Robinson Hall B203

Office Hours: W 6:00 7:00 pm or by appointment, ENGR 4448.


First day of classes: W, September 2, 2015

No class on W, November 25, 2015 (Thanksgiving recess)

Last day of classes: W, December 9, 2015


Final Exam: W, December 16, 2015, 7:30 pm 10:15 pm

Required Textbook: P. N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison Wesley, 2006 (including slides). http://www-users.cs.umn.edu/~kumar/dmbook/index.php

Complementary Textbook 1: J. Han and M. Kamber, Data Mining (3rd Ed.) Morgan Kaufmann, 2011 (including slides). http://web.engr.illinois.edu/~hanj/bk3/bk3_slidesindex.htm

Complementary Textbook 2: I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.), Morgan Kaufmann, 2011 (including slides). http://www.cs.waikato.ac.nz/ml/weka/book.html

Complementary Textbook 3: T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning (2nd ed.), Springer, 2009, (including slides). http://statweb.stanford.edu/~tibs/ElemStatLearn/

Complementary Textbook 4: A. Rajaraman, J. Leskovec, and J. D. Ullman, Mining of Massive Datasets (2nd ed.), Cambridge University Press, 2014. http://infolab.stanford.edu/~ullman/mmds/book.pdf

Software and Data:

UCI Machine Learning Repository is a repository of databases and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. http://archive.ics.uci.edu/ml/

UCI Knowledge Discovery in Databases Archive is an online repository of large data sets which encompasses a wide variety of data types, analysis tasks, and application area. http://kdd.ics.uci.edu/

Kaggle is the home of data science and data mining competitions. http://www.kaggle.com/

Resources: Software and Data. http://www-users.cs.umn.edu/~kumar/dmbook/resources.htm

WEKA http://www.cs.waikato.ac.nz/ml/weka/

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. http://mallet.cs.umass.edu/

SVM light and LibSVM are two popular implementations of various support vector machines (SVM) algorithms. http://svmlight.joachims.org/ and http://www.csie.ntu.edu.tw/~cjlin/libsvm/

R Programming language for statistical computing and graphics. http://en.wikipedia.org/wiki/R_%28programming_language%29 and http://www.r-project.org/

MATLAB and Toolboxes The Language of Technical . http://www.mathworks.com/products/matlab/


Grading Composition (100 points)

         Homework 25 % // late homework not accepted //

         Midterm (Tentative) W, October 14, 2015 20 %

         Team Term Projects and FINAL Review December 2, 2015

and December 9, 2015 25 %

         (Cumulative) Final December 16, 2015 - 30 %

Grading Scale


Computing Resources


University email Policy


Per university policy 1315 http://universitypolicy.gmu.edu/policies/employees-electronic-communications/ , you must use university email for all Mason-related email. Failure to do so puts us at risk of a violation of FERPA and could expose your entire personal email communications to legal discovery actions in the event of any legal actions that involve you.

Honor Code

You are expected to abide by the GMU honor code. Homework assignments and exams are individual efforts. Information on the university honor code can be found at


Additional departmental CS information: http://cs.gmu.edu/wiki/pmwiki.php/HonorCode/CSHonorCodePolicies