INFS 795/IT 803

Special Topics in Data Mining Applications: 

Mining Temporal Data

Dr. Jessica Lin

FALL 2007

Home | Schedule | Resources

 

HOME


Instructor:

Dr. Jessica Lin 

Office: Science & Technology II, Room 453

Phone: 703-993-4693

Email: jessica [AT] ise [DOT] gmu [DOT] edu

Office Hours: TBA

Classes

Thursdays
7:20-10:00pm
Science & Technology II 15

Prerequisite:

INFS 755 or equivalent. Some programming skills required for the final project.

Textbook (optional):

Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kauffmann Publishers, March 2006. ISBN 1-55860-901-6.

Course Description:

Time series, or measurements taken over time in its traditional sense, is perhaps the most commonly encountered data type, encompassing almost every human endeavor including medicine, finance, aerospace, industry, science, etc. While time series data present special challenges to researchers due to its unique characteristics, the past decade has seen an explosion in time series data mining. This seminar provides an overview on state of the art research on mining temporal data. Topics covered include data representation, similarity search, clustering, classification, anomaly detection, rule discoery, motif discovery, and visualization. Sequential pattern discovery on discrete, temporal data (web logs, customer transactions, etc). and mining of streaming time series will also be discussed.

Course Format:

The course will include lectures by the instructor, presentations from students, and class discussion. You will be asked to read research papers published in major conferences and/or journals (paper list TBA).

Grading

Grading will be based on participation, a presentation, quizzes (or paper summaries), and a final project. Each week you are required to read two papers, one of which will be presented by a student.

 Participation/Attendance: 5%
 Quizzes/Assignments: 20%
 Presentation: 25%
 Project Proposal: 15%
 Project: 35%

Schedule

Assigned papers should be read prior to the class meeting (e.g. read papers #1 and #2 for the 9/13 class). The papers in bold face will be presented by the instructor.

WeeksDatesTopicsPapersPresenter(s)
18/30Introduction [slides]  
29/6Time Series Similarity Search/Indexing I [slides]1, 2 
39/13Time Series Similarity Search/Indexing II [slides]3, 4 
49/20Symbolic Representation [SAX]5 
59/27Classification [classification]6, 7 Yun-Sheng Wang
610/4Clustering/Rule Discovery [meaningless]8, 9 Fei Qu
710/11Anomaly Detection [HOT SAX]10, 11 
810/18Motif Discovery 12, 13 Joseph Jinn
910/25Burst/Periodicity Detection14, 15 Yun-Sheng Wang
1011/1Trajectories18, 19 Burt Wagner
1111/8Visualization16, 17 Fei Qu
1211/15Sequential Pattern Mining20, 21 Burt Wagner, Hong Chai
1311/22Thanksgiving - No Class  
1411/29Streaming Time Series/Data Streams22, 23 Joseph Jinn
1512/6Project Presentation   


Paper List

 

1. Rakesh Agrawal, Christos Faloutsos and Arun Swami, Efficient Similarity Search In Sequence Databases FODO conference, Evanston, Illinois, Oct. 13-15, 1993.

2. Christos Faloutsos, M. Ranganathan and Yannis Manolopoulos Fast Subsequence Matching in Time-Series Databases Proc. ACM SIGMOD, Minneapolis MN, May 25-27, 1994, pp. 419-429

3. Chan, K. & Fu, A. W. (1999). Efficient time series matching by wavelets. In proceedings of the 15th IEEE Int'l Conference on Data Engineering. Sydney, Australia, Mar 23-26. pp 126-133. 

4. Keogh, E. and Kasetty, S. (2002). On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 23 - 26, 2002. Edmonton, Alberta, Canada. pp 102-111. 

5. Lin, J., Keogh, E., Li, W. & Lonardi, S. (2007). Experiencing SAX: A Novel Symbolic Representation of Time Series. Data Mining and Knowledge Discovery Journal. To Appear.

6. Geurts, P. 2001. Pattern Extraction for Time Series Classification. In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (September 03 - 05, 2001). L. D. Raedt and A. Siebes, Eds. Lecture Notes In Computer Science, vol. 2168. Springer-Verlag, London, 115-127.

7. Li Wei and Eamonn Keogh  (2006) Semi-Supervised Time Series Classification. SIGKDD 2006.

8. Keogh, E., Lin, J. & Truppel, W. (2003). Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research. In proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003). Melbourne, FL. Nov 19-22. p.115-122.   

9. Gavrilov, M., Anguelov, D., Indyk, P. & Motwani, R. (2000). Mining the stock market: which measure is best? In proceedings of the 6th ACM Int'l Conference on Knowledge Discovery and Data Mining. Boston, MA, Aug 20-23. pp 487-496.

10. D. Dasgupta and S. Forrest, "Novelty Detection in Time Series Data Using Ideas from Immunology", Proceedings of the 5th International Conference on Intelligent Systems, Reno, June, 1996.

11. Keogh, E., Lin, J. & Fu, A. (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. In the 5th IEEE International Conference on Data Mining. New Orleans, LA. Nov 27-30.

12. Chiu, B., Keogh, E., and Lonardi, S. 2003. Probabilistic discovery of time series motifs. In Proceedings of the Ninth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Washington, D.C., August 24 - 27, 2003). KDD '03. ACM Press, New York, NY, 493-498.

13. Minnen, D., Essa, I., Isbell, C. L. & Starner, T. 2007. Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery. In Proceedings of the 2007 IEEE International Conference on Data Mining. Omaha, NE. Oct 28-31. To Appear.

14. Vlachos, M., Meek, C., Vagena, Z., and Gunopulos, D. 2004. Identifying similarities, periodicities and bursts for online search queries. In Proceedings of the 2004 ACM SIGMOD international Conference on Management of Data (Paris, France, June 13 - 18, 2004). SIGMOD '04. ACM Press, New York, NY, 131-142.

15. Michail Vlachos, Kun-Lung Wu, Shyh-Kwei Chen, Philip S. Yu: Fast Burst Correlation of Financial Data. PKDD 2005: 368-379

16. Lin, J., Keogh, E., Lonardi, S., Lankford, J. P., and Nystrom, D. M. 2004. Visually mining and monitoring massive time series. In Proceedings of the Tenth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Seattle, WA, USA, August 22 - 25, 2004). KDD '04. ACM Press, New York, NY, 460-469.

17. Hochheiser, H. & Shneiderman, B. 2004. Dynamic Query Tools for Time Series Data Sets, Timebox Wedges for Interactive Exploration. Information Visualization Journal 3, 1. pp. 1-18.

18. Anagnostopoulos, A., Vlachos, M., Hadjieleftheriou, M., Keogh, E. & Yu, P. S. 2006. Global Distance-Based Segmentation of Trajectories. In Proceedings of the 12th ACM KDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA. Aug 20-23. 

19. Giannotti, F., Nanni, M., Pinelli, F. & Pedresch, D. 2007. Trajectory Pattern Mining. In Proceedings of the 13th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (San Jose, CA, USA, August 12 - 15, 2007).

20. Agrawal, R. and Srikant, R. 1995. Mining Sequential Patterns. In Proceedings of the Eleventh international Conference on Data Engineering (March 06 - 10, 1995). P. S. Yu and A. L. Chen, Eds. ICDE. IEEE Computer Society, Washington, DC, 3-14.

21. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovery of frequent episodes in event sequences . Data Mining and Knowledge Discovery 1(3): 259 - 289, November 1997.

22. Gao, L., Yao, Z., and Wang, X. S. 2002. Evaluating continuous nearest neighbor queries for streaming time series via pre-fetching. In Proceedings of the Eleventh international Conference on information and Knowledge Management (McLean, Virginia, USA, November 04 - 09, 2002). CIKM '02. ACM Press, New York, NY, 485-492. 

23. Zhu, Y. and Shasha, D. 2003. Efficient elastic burst detection in data streams. In Proceedings of the Ninth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Washington, D.C., August 24 - 27, 2003). KDD '03. ACM Press, New York, NY, 336-345.