Class/Sec: | CS 584 (001): Data Mining |
Instructor: | Huzefa Rangwala Room #4423 Engineering Building, rangwala@gmu.edu |
Class Time & Location: | Tue 4:30-7:10 pm EST Online, Streaming (Synchronous) |
Text Book: | Pang-Ning Tan, Michael Steinbach, Anuj Karpatne and Vipin Kumar Introduction to Data Mining (Second Edition) Book's companion website |
Teaching Assistant: TBD | TBD |
Office Hours: | Instructor: Online via Zoom (Monday 10-11 am EST or by appointment) |
Communication and Class Link: | Piazza Link: Piazza |
Automated Data Mining Hackathon Host: | Miner (Only ON Campus or VPN) |
Please note the syllabus is subject to change to enrich the student's learning experience :). Feel free to email rangwala@cs.gmu.edu for questions, concerns, or even say hi.
Course Description |
---|
Over the past decade there has been an exponential increase in the amount of data. This has lead to development of techniques to discover useful and interesting information from the large collections of data. This course aims to provide a overview of the key data mining methods and techniques like classification, clustering, and association rule mining. The course will also provide interesting application examples of data mining, especially in the field of social media analysis, text analysis and learning analytics. This course will also provide an in-depth discussion of the ethical and societal promises and perils that these technologies pose for decision making in today's world. |
Course Prerequisites |
Programming experience in Python Preferred. Java or C will work as well but Assignments will use the Python framework. Students should be familiar with basic probability and statistics concepts, and linear algebra. Please expect lots of programming in all the assignments and class projects. |
Course Format |
Lectures will be given by the instructor. Besides material from the textbook, topics not discussed in the book may also be covered. Research papers and handouts of material not covered in the book will be made available. Grading will be based on homework assignments, exam, and a project. Homework assignments will require intensive programming using an automated competition style solution development for data mining challenges. Exams and homework assignments must be done on an individual basis unless stated. Any deviation from this policy will be considered a violation of the GMU Honor Code. |
Course Outcomes |
As an outcome of taking this class, a student will be able to
|
Topics
Introduction |
Data and It's Various Forms |
Classification: Models, Methods and Applications |
Clustering: Methods and Applications |
Ethics, Fairness, Accountability, and Transparency |
Association Rule Mining |
Applications: Biological Data Mining |
Applications: Recommender Systems |
Applications: Learning Analytics |
Applications: Advanced Supervised Learning |
Anomalies, Outliers |
Deliverable | Deadline | Grade Weights |
---|---|---|
HW0 | Feb 2 | 0% |
HW1 | Feb 16 | 10% |
HW2 | Mar 2 | 10% |
HW3 | Mar 16 | 10% |
HW4 | Apr 6 | 10% |
Fairness, Accountability and Transparency Discussion | TBD | 10% |
Quiz (In-Class) | Multiple | 10% |
Project Pitch | Feb 23 | 0% |
Project Proposal | Mar 9 | 5% |
Video Project Presentation | Apr 25 | 10% |
Project Report | Apr 30 | 25% |
Extra Credits: Competition Winners | If you win a competition (HW-1 to HW-4) then you receive an extra 1% added to the final grade |
Grade | Score Range |
---|---|
A | >96 |
A- | 92-96 |
B+ | 88-92 |
B | 84-88 |
B- | 80-84 |
C+ | 76-80 |
C | 72-76 |
C- | 68-72 |
F | < 68 |
Technology Requirements |
---|
This class is being offered entirely online. Technology requirements to successfully complete this class include a computer (desktop/laptop) that can access class materials posted on Piazza, Internet access sufficient to attend synchronous class virtually through Zoom (primary)/Blackboard Collaborate Ultra (backup), and the use of a working microphone and webcam to allow full participation in class activities. VPN is needed to access Miner2.vsnet.gmu.edu and submit solutions to assignments. Blackboard will be used for accepting code submissions and project-related files. In-class quizzes will be run via itempool.com and other technologies. | Attendance |
Attendance is highly recommended for doing well in the class. This class has lots of active learning exercises, and they will be a lot of fun. Some quizzes will be graded |
Assignment Submission |
Please ensure that the assignments are submitted on-time. No late submissions are allowed. There will be several assignments and there may be dependencies amongst consecutive assignments. The assignments are structured so that you can have multiple attempts towards the solution and there are no correct/unique solutions towards these challenging real world problems. They are designed to simulate real world data analytics. Assignments will be accepted for HW1-4 via Miner2 and Gradescope File Requests. |
Make-Up Exams & Incompletes |
Make up exams and incompletes will not be given for this class. |
Academic Honesty and GMU Honor Code |
Please visit the GMU Honor Code and do not copy assignment solutions from your peers, internet or any source unless stated in the assignment description. Project done for this class has to be exclusive for this class and cannot be done for other classes. |
Disability Statement |
If you have a documented learning disability or other condition that may affect academic performance you should: 1) make sure this documentation is on file with the Office of Disability Services (SUB I, Rm. 222; 993-2474; www.gmu.edu/student/drc to determine the accommodations you need; and 2) talk with me to discuss your accommodation needs. |