**Prerequisites: **Grade of C or better in CS 310 and STAT 344.

**Instructor**: Prof. Harry
Wechsler **wechsler@gmu.edu**

**Email correspondence: from / to GMU accounts with subject:
CS 584**

**Course Description**** –** Concepts and
techniques in data mining and multidisciplinary applications. Topics include
databases; data cleaning and transformation; concept description; association
and correlation rules; data classification and predictive modeling; performance
analysis and scalability; data mining in advanced database systems, including
text, audio, and images; and emerging themes and future challenges.

**Goals**: Critical Thinking; Model Selection and Predictive
Analytics Using Cross-Validation; Meaningful (size and scope) Data Mining
Applications (to find novel and useful patterns); Experimental Design, Metrics
and Performance Evaluation; Theory vs. Practice.

**Time, Day, and Venue**:
W 7:20 pm – 10:00 pm

– Robinson Hall B203

**Office Hours: **W** **6:00
– 7:00 pm or by appointment, ENGR 4448.

http://registrar.gmu.edu/calendars/fall-2015/

First
day of classes: **W**, September 2, 2015

No
class on **W**, November 25, 2015
(Thanksgiving recess)

Last
day of classes: **W**, December 9, 2015

http://registrar.gmu.edu/calendars/fall-2015/exams/

**Final Exam**: **W**,
December 16, 2015, 7:30 pm – 10:15 pm

**Required Textbook: **P. N. Tan, M.
Steinbach, and V. Kumar, *Introduction
to Data Mining*, Addison Wesley, 2006 (including slides). http://www-users.cs.umn.edu/~kumar/dmbook/index.php

**Complementary Textbook 1:** J. Han and M.
Kamber, *Data Mining* (3rd Ed.) Morgan
Kaufmann, 2011 (including slides). http://web.engr.illinois.edu/~hanj/bk3/bk3_slidesindex.htm

**Complementary Textbook 2:** I. H. Witten, E.
Frank, and M. A. Hall, *Data Mining:
Practical Machine Learning Tools and Techniques* (3rd ed.), Morgan Kaufmann,
2011 (including slides). http://www.cs.waikato.ac.nz/ml/weka/book.html

**Complementary Textbook 3: **T. Hastie, R.
Tibshirani, and J. Friedman, *The** Elements of
Statistical Learning* (2nd ed.), Springer, 2009, (including slides). http://statweb.stanford.edu/~tibs/ElemStatLearn/

**Complementary Textbook 4: **A. Rajaraman, J.
Leskovec, and J. D. Ullman, *Mining of
Massive Datasets* (2nd ed.), Cambridge University Press, 2014. http://infolab.stanford.edu/~ullman/mmds/book.pdf

**Software and Data:**

**UCI
Machine Learning Repository** is a repository of databases and data
generators that are used by the machine learning community for the empirical
analysis of machine learning algorithms. http://archive.ics.uci.edu/ml/

**UCI Knowledge Discovery in Databases
Archive**
is an online repository of large data sets which encompasses a wide variety of
data types, analysis tasks, and application area. http://kdd.ics.uci.edu/

**Kaggle** is the home of data science and data mining
competitions. http://www.kaggle.com/

**Resources: Software and Data**. http://www-users.cs.umn.edu/~kumar/dmbook/resources.htm

**WEKA** http://www.cs.waikato.ac.nz/ml/weka/

**MALLET** is a Java-based
package for statistical natural language processing, document classification,
clustering, topic modeling, information extraction, and other machine learning
applications to text. http://mallet.cs.umass.edu/

**SVM light** and **LibSVM** are two popular
implementations of various support vector machines (SVM) algorithms. http://svmlight.joachims.org/ and http://www.csie.ntu.edu.tw/~cjlin/libsvm/

**R – Programming language for statistical computing and
graphics. **http://en.wikipedia.org/wiki/R_%28programming_language%29 and http://www.r-project.org/

**MATLAB and Toolboxes – The Language of Technical .** http://www.mathworks.com/products/matlab/

**CLOSED BOOK
EXAMINATIONS**

·
Homework
– 25 % // late homework not accepted //

·
Midterm
** –** (Tentative)

·
Team
Term Projects and FINAL Review ** – **December 2, 2015

__and__ December 9, 2015 ** –
**25
%

·
(Cumulative)
Final ** – **December 16, 2015

**http://www.fcps.edu/southcountyhs/sservices/gradescale.html**

**Computing Resources**

**http://labs.vse.gmu.edu/uploads/FacultyFAQ/StudentWelcome.pdf**

Per
university policy 1315 http://universitypolicy.gmu.edu/policies/employees-electronic-communications/ , you must use
university email for all Mason-related email. Failure to do so puts us at risk
of a violation of FERPA and could expose your entire personal email
communications to legal discovery actions in the event of any legal actions
that involve you.

You are expected to abide by the GMU honor code. Homework assignments and exams are individual efforts. Information on the university honor code can be found at

**http://oai.gmu.edu/the-mason-honor-code/**

Additional
departmental CS information: **http://cs.gmu.edu/wiki/pmwiki.php/HonorCode/CSHonorCodePolicies**