**Prerequisites: **Grade of C or better in CS 310 and STAT 344. Prerequisite enforced
by registration system (see http://catalog.gmu.edu/preview_course_nopop.php?catoid=17&coid=108697)

**Instructor**: Prof. Harry
Wechsler **wechsler@gmu.edu**

**Email correspondence: from GMU accounts with subject: CS
484**

**Course Description**** –** Basic principles and
methods for data analysis and knowledge discovery. Emphasizes developing basic
skills for modeling and prediction, on one side, and performance evaluation, on
the other. Topics include system design; data quality, preprocessing, and
association; event classification; clustering; biometrics; business intelligence;
and mining complex types of data.

**Course (ABET) Outcomes:**

1.
The
ability to apply computing principles, probability and statistics relevant to
the data mining discipline to analyze data.

2.
A
thorough understanding of model programming with data mining tools, algorithms
for estimation, prediction, and pattern discovery.

3.
The
ability to analyze a problem, identifying and defining the computing
requirements appropriate to its solution: data collection and preparation,
functional requirements, selection of models and prediction algorithms,
software, and performance evaluation.

4.
The
ability to understand performance metrics used in the data mining field to
interpret the results of applying an algorithm or model, to compare methods and
to reach conclusions about data.

5.
The
ability to communicate effectively to an audience the steps and results
followed in solving a data mining problem (through a term project).

**Time, Day, and Venue**: TR – Tuesday/Thursday, 12:00 – 1:15
pm

– Innovation Hall 136

**Office Hours: **TR – Thursday, 1:30 – 2:15 pm or by
appointment, ENGR 4448.

http://registrar.gmu.edu/calendars/spring-2015/

First day of classes: Tuesday, January 20

Spring break: no class on Tuesday, March 10 and Thursday, March 12

Last day of classes: Thursday, April 30

http://registrar.gmu.edu/calendars/spring-2015/final-exam/

**Final Exam**: Thursday, May 7, 10:30 – 1:15 pm

**Required Textbook: **P. N. Tan, M.
Steinbach, and V. Kumar, *Introduction
to Data Mining*, Addison Wesley, 2006. http://www-users.cs.umn.edu/~kumar/dmbook/index.php

**Complementary Textbook 1:** I. H. Witten, E.
Frank, and M. A. Hall, *Data Mining:
Practical Machine Learning Tools and Techniques* (3rd ed.), Morgan Kaufmann,
2011. http://www.cs.waikato.ac.nz/ml/weka/book.html

**Complementary Textbook 2: **T. Hastie, R.
Tibshirani, and J. Friedman, *The Elements
of Statistical Learning* (2nd ed.), Springer, 2009. http://statweb.stanford.edu/~tibs/ElemStatLearn/

**Complementary Textbook 3: **A. Rajaraman, J.
Leskovec, and J. D. Ullman, *Mining of
Massive Datasets* (2nd ed.), Cambridge University Press,2014.
http://infolab.stanford.edu/~ullman/mmds/book.pdf

**Software and Data:**

**UCI Machine Learning Repository** is a repository of
databases and data generators that are used by the machine learning community
for the empirical analysis of machine learning algorithms. http://archive.ics.uci.edu/ml/

**UCI Knowledge Discovery in
Databases Archive** is an online repository of large data sets
which encompasses a wide variety of data types, analysis tasks, and application
area. http://kdd.ics.uci.edu/

**Kaggle** is the home of data science and data mining competitions.
http://www.kaggle.com/

**Resources: Software and Data**. http://www-users.cs.umn.edu/~kumar/dmbook/resources.htm

**WEKA** http://www.cs.waikato.ac.nz/ml/weka/

**MALLET** is a Java-based
package for statistical natural language processing, document classification,
clustering, topic modeling, information extraction, and other machine learning
applications to text. http://mallet.cs.umass.edu/

**SVM light**
and **LibSVM** are two popular
implementations of various support vector machines (SVM) algorithms. http://svmlight.joachims.org/ and http://www.csie.ntu.edu.tw/~cjlin/libsvm/

**R – a programming language for statistical
computing and graphics. **http://en.wikipedia.org/wiki/R_%28programming_language%29
and http://www.r-project.org/

**MATLAB and Toolboxes – The Language of Technical
.**
http://www.mathworks.com/products/matlab/

**CLOSED BOOK
EXAMINATIONS**

·
Homework
** – **20%

·
**Class participation
and Quizz(es) – 10%**

·
(Non-Cumulative)
MidTerm1 and MidTerm2 ** –**
Thursday, February 27 & Thursday, March 26

·
Term
Project ** – **April 28 and 30

·
(Cumulative)
Final ** – **May 7

**http://www.fcps.edu/southcountyhs/sservices/gradescale.html**

You are expected to abide by the GMU honor code. Homework assignments and exams are individual efforts. Information on the university honor code can be found at

**http://oai.gmu.edu/the-mason-honor-code/**

Additional
departmental CS information: **http://cs.gmu.edu/wiki/pmwiki.php/HonorCode/CSHonorCodePolicies**