CS 504 Principles of Data Management
and Mining
Course Description (From Catalog)
Techniques to store,
manage, and use data including databases, relational model, schemas, queries
and transactions. On Line Transaction Processing, Data Warehousing,
star schema, On Line Analytical Processing. MOLAP, HOLAP, and hybrid systems.
Overview of Data Mining principles, models, supervised and unsupervised
learning, pattern finding. Massively parallel architectures and Hadoop.
Instructor: James J. Nolan, Ph.D.
Contact: jnolan5@gmu.edu
Day/Time: Wednesday, 4:30-7:10pm
Location: Arlington: Founders Hall
466
Office Hours: By appointment
Prerequisites
Graduate Standing
Note: This course cannot be
taken for credit by students of the MS CS, MS ISA, MS SWE, MS IS, CS PhD or IT
PhD programs.
Honor Code
Statement
Please be familiar
with the GMU
Honor Code. In addition, the CS department has its own Honor Code policies. Any
deviation from this is considered an Honor Code violation.
Disability
Accommodations
If you are a student
with a disability and you need academic accommodations, please see me and contact
the Office of Disability Services (ODS) at 993-2474, http://ods.gmu.edu. All academic accommodations
must be arranged through the ODS.
Textbooks
Required (available in
Safari Books):
Data Science for
Business: What You Need To Know About Data Mining and Data-Analytic Thinking (Foster
Provost and Tom Fawcett)
Making Sense of NoSQL: A Guide for
Managers and the Rest of Us (Dan McCreary and Ann Kelly)
Various reading
materials will also be given in class.
Optional (available in
Safari Books)
Hadoop: The Definitive
Guide, 4th Edition (Tom White)
Grading Policies
Homework: 15%
Project: 25%
Midterm: 25%
Final: 35%
Class Schedule (subject to change at Instructor’s discretion)
Class # |
Date |
Topic |
Notes |
1 |
9/2 |
Introduction |
|
2 |
9/9 |
ER |
|
3 |
9/16 |
Relational Model 1 |
|
4 |
9/23 |
Relational Model 2 |
|
5 |
9/30 |
SQL |
Guest Instructor: Jim is at Strata |
6 |
10/7 |
Data Warehousing |
|
7 |
10/14 |
No SQL/MapReduce |
Project proposal and presentations due |
8 |
10/21 |
Midterm |
|
9 |
10/28 |
Data Mining 1 |
|
10 |
11/4 |
Data Mining 2 |
|
11 |
11/11 |
Data Mining 3 |
|
12 |
11/18 |
Data Mining 4 |
|
13 |
12/2 |
Project Results Presentations |
Project Results Presentation due |
14 |
12/9 |
Course Review |
Project Final Report due |
15 |
12/16 |
Final Exam |
|
Class Project
25% of final grade - where you solve
a data-science problem from data preparation to data product.
·
Project
Proposal Paper- 2 pages maximum plus 5-minute in-class pitch -- due on 10/14.
o Should include answers to the
following questions:
1.
What
is the problem?
2.
Why
is it interesting and important?
3.
Why
is it hard? Why have previous approaches failed?
4.
What
are the key components of your approach?
5.
What
data sets and metrics will be used to validate the approach?
o Project Results Presentation - 10-minute
presentation -- due on 12/2
o Final report - 6 pages maximum -- due
on 12/9.
§ For guidance on writing the final
report, see slide 70 of Eamonn Keogh's KDD'09
Tutorial on How to do good
research, get it published in SIGKDD and get it cited!
§ Follow ACM
formatting guidelines