CS 504 Principles of Data Management and Mining
Course Description (From Catalog)
Techniques
to store, manage, and use data including databases, relational model, schemas,
queries and transactions. On Line Transaction Processing, Data Warehousing, star schema, On Line Analytical Processing.
MOLAP, HOLAP, and hybrid systems. Overview of Data Mining principles, models,
supervised and unsupervised learning, pattern finding. Massively parallel
architectures and Hadoop.
Instructor:
Arthur T. Conroy, Ph.D.
Contact: aconroy2@gmu.edu
Day/Time: Tuesday, 4:30-7:10pm
Location: Arlington: Founders Hall 121
Office Hours: By appointment &
Tuesday (one hour before and/or after class)
Prerequisites
Graduate Standing
Note: This course cannot
be taken for credit by students of the MS CS, MS ISA, MS SWE, MS IS, CS PhD or
IT PhD programs.
Honor Code Statement
Please be familiar with the GMU Honor Code. In
addition, the CS department has its own Honor Code policies. Any
deviation from this is considered an Honor Code violation.
Disability Accommodations
If you are a student with a
disability and you need academic accommodations, please see me and contact the
Office of Disability Services (ODS) at 993-2474, http://ods.gmu.edu. All academic accommodations must be arranged through the
ODS.
Textbooks: Required (available
in Safari Books):
Data
Science for Business: What You Need To Know About Data Mining and Data-Analytic
Thinking (Foster Provost and Tom
Fawcett)
Making
Sense of NoSQL: A Guide for Managers and the Rest of Us (Dan McCreary and Ann Kelly),various
reading materials will also be given in class.
Textbooks: Optional
(available in Safari Books)
Hadoop: The Definitive Guide, 4th
Edition (Tom White)
Grading Policies
Homework: 15%
Project: 25%
Midterm: 25%
Final: 35%
Class Schedule (subject to change)
Class # |
Date |
Topic |
Notes |
1 |
1/19/16 |
Introduction |
|
2 |
1/26/16 |
Entity Relationship Models |
|
3 |
2/2/16 |
Relational Model 1 |
|
4 |
2/9/16 |
Relational Model 2 |
|
5 |
2/16/16 |
Structured Query Language(SQL) |
|
6 |
2/23/16 |
Data Warehousing |
|
7 |
3/1/16 |
No SQL/Map Reduce |
Project proposal presentations due |
8 |
3/15/16 |
Midterm (Week after Spg Bk) |
|
9 |
3/22/16 |
Data Mining 1 |
|
10 |
3/29/16 |
Data Mining 2 |
|
11 |
4/5/16 |
Data Mining 3 |
|
12 |
4/12/16 |
Data Mining 4 |
|
13 |
4/19/16 |
Project Results Presentations |
Project Results Presentation due |
14 |
4/26/16 |
Course Review |
Project Final Report due |
15 |
5/3/16 |
Final Exam |
|
Class
Project
25% of final grade - where you solve
a data-science problem from data preparation to data product.
� Project Proposal Paper- 2 pages maximum plus 5-minute
in-class pitch -- due on 10/14.
o
Should include answers to the
following questions:
1.
What is the problem?
2.
Why is it interesting and important?
3.
Why is it hard? Why have previous
approaches failed?
4.
What are the key components of your
approach?
5.
What data sets and metrics will be
used to validate the approach?
o Project Results Presentation - 10-minute presentation -- due
on 12/2
o Final report - 6 pages maximum -- due on 12/9.
� For guidance on writing the final report, see slide 70 of Eamonn Keogh's KDD'09 Tutorial on How to do good research, get it published in SIGKDD and get
it cited!
� Follow ACM formatting guidelines