CS 504 Principles of Data Management and Mining

Course Description (From Catalog)

Techniques to store, manage, and use data including databases, relational model, schemas, queries and transactions. On Line Transaction Processing, Data Warehousing, star schema, On Line Analytical Processing. MOLAP, HOLAP, and hybrid systems. Overview of Data Mining principles, models, supervised and unsupervised learning, pattern finding. Massively parallel architectures and Hadoop.

 

Instructor: Arthur T. Conroy, Ph.D.

Contact: aconroy2@gmu.edu

Day/Time: Tuesday, 4:30-7:10pm

Location: Arlington: Founders Hall 121

Office Hours: By appointment & Tuesday (one hour before and/or after class)

 

Prerequisites

Graduate Standing

Note: This course cannot be taken for credit by students of the MS CS, MS ISA, MS SWE, MS IS, CS PhD or IT PhD programs.

 

Honor Code Statement

Please be familiar with the GMU Honor Code. In addition, the CS department has its own Honor Code policies. Any deviation from this is considered an Honor Code violation. 

 

Disability Accommodations

If you are a student with a disability and you need academic accommodations, please see me and contact the Office of Disability Services (ODS) at 993-2474, http://ods.gmu.edu. All academic accommodations must be arranged through the ODS.


Textbooks: Required (available in Safari Books): 

Data Science for Business: What You Need To Know About Data Mining and Data-Analytic Thinking (Foster Provost and Tom Fawcett) 

Making Sense of NoSQL: A Guide for Managers and the Rest of Us (Dan McCreary and Ann Kelly),various reading materials will also be given in class.

Textbooks: Optional (available in Safari Books)

Hadoop: The Definitive Guide, 4th Edition (Tom White)

 

Grading Policies

Homework: 15%

Project: 25%

Midterm: 25%

Final: 35%

 Class Schedule (subject to change)

Class #

Date

Topic

Notes

1

1/19/16

Introduction

 

2

1/26/16

Entity Relationship Models

 

3

2/2/16

Relational Model 1

 

4

2/9/16

Relational Model 2

 

5

2/16/16

Structured Query Language(SQL)

6

2/23/16

Data Warehousing

 

7

3/1/16

No SQL/Map Reduce

Project proposal presentations due

8

3/15/16

Midterm (Week after Spg Bk)

 

9

3/22/16

Data Mining 1

 

10

3/29/16

Data Mining 2

 

11

4/5/16

Data Mining 3

 

12

4/12/16

Data Mining 4

 

13

4/19/16

Project Results Presentations

Project Results Presentation due

14

4/26/16

Course Review

Project Final Report due

15

5/3/16

Final Exam

 

 

 

Class Project

25% of final grade - where you solve a data-science problem from data preparation to data product. 

         Project Proposal Paper- 2 pages maximum plus 5-minute in-class pitch -- due on 10/14.

o    Should include answers to the following questions:

1.      What is the problem?

2.      Why is it interesting and important?

3.      Why is it hard? Why have previous approaches failed?

4.      What are the key components of your approach?

5.      What data sets and metrics will be used to validate the approach?

o    Project Results Presentation - 10-minute presentation -- due on 12/2

o    Final report - 6 pages maximum -- due on 12/9.

  For guidance on writing the final report, see slide 70 of Eamonn Keogh's KDD'09 Tutorial on How to do good research, get it published in SIGKDD and get it cited!

  Follow ACM formatting guidelines