CS 584: Data Mining (Syllabus)

Class Information
Class/Sec: CS 584 (001): Data Mining
Instructor: Huzefa Rangwala Room #4423 Engineering Building, rangwala@gmu.edu
Class Time & Location: Tue 4:30-7:10 pm EST Online, Streaming (Synchronous)
Text Book: Pang-Ning Tan, Michael Steinbach, Anuj Karpatne and Vipin Kumar Introduction to Data Mining (Second Edition) Book's companion website
Teaching Assistant: TBD TBD
Office Hours: Instructor: Online via Zoom (Monday 10-11 am EST or by appointment)
Communication and Class Link: Piazza Link: Piazza
Automated Data Mining Hackathon Host: Miner (Only ON Campus or VPN)

Please note the syllabus is subject to change to enrich the student's learning experience :). Feel free to email rangwala@cs.gmu.edu for questions, concerns, or even say hi.

About the Course
Course Description
Over the past decade there has been an exponential increase in the amount of data. This has lead to development of techniques to discover useful and interesting information from the large collections of data. This course aims to provide a overview of the key data mining methods and techniques like classification, clustering, and association rule mining. The course will also provide interesting application examples of data mining, especially in the field of social media analysis, text analysis and learning analytics. This course will also provide an in-depth discussion of the ethical and societal promises and perils that these technologies pose for decision making in today's world.
Course Prerequisites
Programming experience in Python Preferred. Java or C will work as well but Assignments will use the Python framework. Students should be familiar with basic probability and statistics concepts, and linear algebra. Please expect lots of programming in all the assignments and class projects.
Course Format
Lectures will be given by the instructor. Besides material from the textbook, topics not discussed in the book may also be covered. Research papers and handouts of material not covered in the book will be made available. Grading will be based on homework assignments, exam, and a project. Homework assignments will require intensive programming using an automated competition style solution development for data mining challenges. Exams and homework assignments must be done on an individual basis unless stated. Any deviation from this policy will be considered a violation of the GMU Honor Code.
Course Outcomes
As an outcome of taking this class, a student will be able to
  • Understand the theory and implement various classification, clustering, association rule-mining algorithms.
  • Apply the data mining techniques learned to real world scientific and/or industrial applications.
  • Consider the impact of these algorithms on society and develop a justice, ethics, diversity and inclusivity (JEDI) mindset

Topics

Introduction
Data and It's Various Forms
Classification: Models, Methods and Applications
Clustering: Methods and Applications
Ethics, Fairness, Accountability, and Transparency
Association Rule Mining
Applications: Biological Data Mining
Applications: Recommender Systems
Applications: Learning Analytics
Applications: Advanced Supervised Learning
Anomalies, Outliers
Assignments/Exams
Deliverable Deadline Grade Weights
HW0 Feb 2 0%
HW1 Feb 16 10%
HW2 Mar 2 10%
HW3 Mar 16 10%
HW4 Apr 6 10%
Fairness, Accountability and Transparency Discussion TBD 10%
Quiz (In-Class) Multiple 10%
Project Pitch Feb 23 0%
Project Proposal Mar 9 5%
Video Project Presentation Apr 25 10%
Project Report Apr 30 25%
Extra Credits: Competition Winners If you win a competition (HW-1 to HW-4) then you receive an extra 1% added to the final grade
Grade Distribution
Grade Score Range
A >96
A- 92-96
B+ 88-92
B 84-88
B- 80-84
C+ 76-80
C 72-76
C- 68-72
F < 68
Policies:
Technology Requirements
This class is being offered entirely online. Technology requirements to successfully complete this class include a computer (desktop/laptop) that can access class materials posted on Piazza, Internet access sufficient to attend synchronous class virtually through Zoom (primary)/Blackboard Collaborate Ultra (backup), and the use of a working microphone and webcam to allow full participation in class activities. VPN is needed to access Miner2.vsnet.gmu.edu and submit solutions to assignments. Blackboard will be used for accepting code submissions and project-related files. In-class quizzes will be run via itempool.com and other technologies.
Attendance
Attendance is highly recommended for doing well in the class. This class has lots of active learning exercises, and they will be a lot of fun. Some quizzes will be graded
Assignment Submission
Please ensure that the assignments are submitted on-time. No late submissions are allowed. There will be several assignments and there may be dependencies amongst consecutive assignments. The assignments are structured so that you can have multiple attempts towards the solution and there are no correct/unique solutions towards these challenging real world problems. They are designed to simulate real world data analytics. Assignments will be accepted for HW1-4 via Miner2 and Gradescope File Requests.
Make-Up Exams & Incompletes
Make up exams and incompletes will not be given for this class.
Academic Honesty and GMU Honor Code
Please visit the GMU Honor Code and do not copy assignment solutions from your peers, internet or any source unless stated in the assignment description. Project done for this class has to be exclusive for this class and cannot be done for other classes.
Disability Statement
If you have a documented learning disability or other condition that may affect academic performance you should: 1) make sure this documentation is on file with the Office of Disability Services (SUB I, Rm. 222; 993-2474; www.gmu.edu/student/drc to determine the accommodations you need; and 2) talk with me to discuss your accommodation needs.