INFS 755 Data Mining (Fall 2008)
Welcome to INFS 755 Data Mining.
Announcements
- 12.09.2008: Folks, the grades are on courses.gmu.edu Thanks for a great semester & happy holidays.
- 12.08.2008: Again, wonderful presentations. The winner for todays contest is Joshua Church. Congrats.
- 12.02.2008: Thanks for the wonderful presentations yesterday. Wahed Sadek wins the best presentation contest for his presentation on Business Intelligence (Day 1). Congrats
- 11.24.2008: Grades uploaded on blackboard. Please log into http://courses.gmu.edu and go to INFS755. In the class click on "My Grades" and you should see your grades and the total sum as of now. This is out of 55 points and there are a combined 45 points at stake with the MT2 and final project. Please let me know if anything looks amiss.
- 11.17.2008: Presenters on 12/1. It is ok not to have your projects completed by then i.e., not the due date for your project. If you don't have results that is fine, show me the expected results and whatever results you have till then. Also mention the experiments you will be performing in the next week.
- 11.15.2008: Please send me your slides in "pdf" a day before your presentation is due. We will use this order of presentation
- 12/1
- Title: Spam mail detection
Team Members: Mayur Bhot, Niveditha Nagare,Lavanya Lingala - David Tsicone
- Wahid
- Netflix by sushil & Manasa
- Ankit and John
- Yumn, Solomon and Jasson.
- Tao
- 12/8
- Ned Warner
- WikiClust by Salman, Sarah, and Hesham
- Johanna Koh
- Document Clustering by Joshua Church
- Automatic Text Categorization Using Author and Genre. by Team Members: Priya Antony, Nishitha Seri
- functional association rule miner. Jill
- Khalid and Madhukar
- Title: Spam mail detection
- 11.15.2008: Allocated presentation time for will be 25 minutes for groups of more than 1, and 20 minutes for single groups. Leave 5 minute for questioning. Timing is really important
- 11.08.2008: Syllabus for Mid-Term 2 includes
Sections 4.1, 4.2, 4.3, 4.4, 4.5, 5.1, 5.2,5.3, 5.5, 5.6, 5.7, 8.1, 8.2, 8.3, 8.4, 8.5, 6.1, 6.3, 6.4, 6.5, 6.6, 6.7, 10.1, 10.2, 10.3, 10.4, 10.5. It will be a closed-book, closed notes exam, 2 hour exam. You will be allowed one page, one sided A4 size of cheat sheet. Write anything you want in there. Bring your calculators. Exam is on 10/24/2008. - 11.08.2008: Please look at the Assignments page for project guidelines along with grading criteria
- 11.03.2008: Please sign up for a presentation slot, either 12/1 or 12/8
- 10.29.2008: Mid-Term 2 will be held on 11/24/2008.
- 10.27.2008: Thanks for participating in the jigsaw today. I will be covering maximal and closed frequent itemsets next time. HW2 will be given out then too.
- 10.23.2008: Quick note about Assignment 3. Part 3 is optional (extra credit :). Do it if you feel like. The assignment will be graded out of 80 points).
- 10.20.2008: Assignment 3 is UP!
- 10.14.2008: Software for Frequent Pattern Mining
- 10.14.2008: We will review several clustering algorithms using the JIGSAW ACTIVITY .
- 10.14.2008 A Comprehensive Survey on Clustering Algorithms .
- 10.07.2008: Bayesian Net Comment
In a causal graph, given a parent, variable X is independent of all its non-descendents. Example in a graph where G (Gene), S (Smoking) and L (Lung Cancer) are the variables - I can represent a causal graph as
L <---- G ---> S
Here, we can say that L is conditionally independent of its non-descendent S given the parent node G.
- 10.06.2008: Remember next week's class will be on Tuesday 10.14.2008. The deadline for Assignment 2 has been extended to 10.14.2008
- 10.06.2008: Typo in Assignment 2 (Part 2, Q2) The "=" sign should be "-".
- 10.01.2008: A Data Mining Blog
- 10.01.2008: Note, it should be fairly evident that survey projects cannot be done in teams. Those projects can only be done individually, since you have to read a whole set of papers, analyze them and explain in your report your opinions about them. Only application projects can be done in teams.
- 09.30.2008: Anyone looking for a project partner ? Khalid is interested in teaming up ... please email me at chaudry@naeyc.org.
- 09.26.2008: We will have a closed book/closed notes/no cheat sheet exam for mid-term 1. The syllabus includes everything discussed in class, and material from chapters 1,2,3.1,3.2,3.4,4, and 5.7 plus the reading on data warehousing
- 09.24.2008: Check out the class I am offering next Spring. Its about bioinformatics, lots of dynamic programming related to sequence analysis & data mining for several problems.
- 09.23.2008: The TA will not be holding office hours on Wednesday (10/01/2008). She will be available for additional appointments via email.
- 09.19.2008: Projects page updated with new ideas. All ideas for projects will be updated there.
- 09.18.2008: Remember to cc your TA for questions on the assignment. I am traveling this weekend in the mid-west :).
- 09.18.2008: Assignment 2 is up. See the Assignments page. Due on 10/13/2008
- 09.17.2008: If you are having difficulties getting a project I can recommend somethings in the bio-informatics arena. One of my works involves annotation protein residues for various functional properties. The project called prosat is here with web-interface called MONSTER .
- 09.17.2008: There are a whole bunch of papers/talks on what people have done for the Netflix competition. Please see the link.
- 09.15.2008: Updated Readings Section, Slides. For Assignment 1 (Part 2), Question 3 b) With Weka, attribute selection can be achieved either from the specific Select attributes tab, or within Preprocess tab. List only one of the different options in Weka for selecting attributes, with a short explanation about the corresponding method.
- 09.14.2008: Project proposal guidelines uploaded
- 09.14.2008 Posted some clarifications for the Assignment 1 on the forum/as well as assignment file. Please re-download the pdf file .
- 09.08.2008: Interesting Data Mining Application Talk this Thursday in room 330A STII, at 11:00 am. Automated Annotation of Drosophila Gene Expression Patterns Using a Controlled Vocabulary
- 09.08.2008: Some more data mining contests. ICMD 2007 contest , ECML 2008 Spam Detection in Social Bookmark contest and ECML 2006 Spam Email Detection Challenge.
- 09.08.2008: We discussed a lot of linear algebra for PCA, see if the material interests you. Download this tutorial. You will not be questioned to prove anything in the examination.
- 08.26.2008: Assignment 1 is up. See the Assignments page.
- 08.25.2008: Another resource link: Here is an excellent WEKA tutorial . I would recommend trying this out on your computer.
- 08.25.2008: I will add some more specifications to the Project Ideas link. An example of a project - Take a crack at the Netflix competition or this year's KDD CUP 2008 or the previous KDD cup competitions.
- 08.22.2008: Please sign up for the class website to access features like discussion forums and data mining journal feeds.
- 08.22.2008: Start playing with Weka . We will use this in our assignments and projects. If you prefer other tools like MATLAB or want to write your own code, you are most welcome.
- 08.15.2008: Class Website and Syllabus is Up! Have a great semester ahead.

