**Prerequisites: **Grade of C or better in CS 310 and STAT 344.

**Instructor**: Prof. Harry
Wechsler **wechsler@gmu.edu**

**Email correspondence: from GMU accounts with subject: CS 659**

**Course Description**** –** Concepts and
techniques in data mining and multidisciplinary applications. Topics include databases;
data cleaning and transformation; concept description; association and
correlation rules; data classification and predictive modeling; performance
analysis and scalability; data mining in advanced database systems, including
text, audio, and images; and emerging themes and future challenges.

**Goals**: Critical Thinking (look for Pitfalls); Model Selection
and Predictive Analytics Using Cross-Validation and Training; Meaningful (size
and scope) Data Mining Application (to find useful patterns); Experimental
Design, Metrics and Performance Evaluation; Theory vs. Practice.

**Time, Day, and Venue**: MWF, 3:45 pm – 6:45 pm

– Nguyen Engineering Building 1107

**Office Hours: **MWF** **2:45
– 3:30 pm or by appointment, ENGR 4448.

http://summer.gmu.edu/dates-2015/

First day of classes: June 29, 2015

No class on Friday, July 3, 2015

Last day of classes: Wednesday, July 29, 2015

**Final Exam**: Friday, July 31, 2015

**Required Textbook: **P. N. Tan, M.
Steinbach, and V. Kumar, *Introduction
to Data Mining*, Addison Wesley, 2006. http://www-users.cs.umn.edu/~kumar/dmbook/index.php

**Complementary Textbook 1:** J. Han and M. Kamber,
*Data Mining* (3rd ed.) Morgan
Kaufmann, 2011. http://web.engr.illinois.edu/~hanj/bk3/bk3_slidesindex.htm

**Complementary Textbook 2:** I. H. Witten, E.
Frank, and M. A. Hall, *Data Mining:
Practical Machine Learning Tools and Techniques* (3rd ed.), Morgan Kaufmann,
2011. http://www.cs.waikato.ac.nz/ml/weka/book.html

**Complementary Textbook 3: **T. Hastie, R.
Tibshirani, and J. Friedman, *The** Elements of
Statistical Learning* (2nd ed.), Springer, 2009. http://statweb.stanford.edu/~tibs/ElemStatLearn/

**Complementary Textbook 4: **A. Rajaraman, J.
Leskovec, and J. D. Ullman, *Mining of
Massive Datasets* (2nd ed.), Cambridge University Press, 2014. http://infolab.stanford.edu/~ullman/mmds/book.pdf

**Software and Data:**

**UCI
Machine Learning Repository** is a repository of databases and data
generators that are used by the machine learning community for the empirical
analysis of machine learning algorithms. http://archive.ics.uci.edu/ml/

**UCI Knowledge Discovery in Databases
Archive**
is an online repository of large data sets which encompasses a wide variety of
data types, analysis tasks, and application area. http://kdd.ics.uci.edu/

**Kaggle** is the home of data science and data mining
competitions. http://www.kaggle.com/

**Resources: Software and Data**. http://www-users.cs.umn.edu/~kumar/dmbook/resources.htm

**WEKA** http://www.cs.waikato.ac.nz/ml/weka/

**MALLET** is a Java-based
package for statistical natural language processing, document classification,
clustering, topic modeling, information extraction, and other machine learning
applications to text. http://mallet.cs.umass.edu/

**SVM light** and **LibSVM** are two popular
implementations of various support vector machines (SVM) algorithms. http://svmlight.joachims.org/ and http://www.csie.ntu.edu.tw/~cjlin/libsvm/

**R – Programming language for statistical computing and
graphics. **http://en.wikipedia.org/wiki/R_%28programming_language%29 and http://www.r-project.org/

**MATLAB and Toolboxes – The Language of Technical .** http://www.mathworks.com/products/matlab/

**CLOSED BOOK
EXAMINATIONS**

·
Homework
– 20% // late homework not accepted //

·
Midterm
** –** Monday, July 13, 2015

·
Team
Term Project and FINAL Review ** – **July 27, 2015

__and__ July 29, 2015 ** – **20 %

·
(Cumulative)
Final ** – **July 31, 2015

**http://www.fcps.edu/southcountyhs/sservices/gradescale.html**

You are expected to abide by the GMU honor code. Homework assignments and exams are individual efforts. Information on the university honor code can be found at

**http://oai.gmu.edu/the-mason-honor-code/**

Additional
departmental CS information: **http://cs.gmu.edu/wiki/pmwiki.php/HonorCode/CSHonorCodePolicies**