
INFS 795 / IT 803 Special Topics in Data Mining Applications Instructor:
Lectures: Thursday 7:2010:00pm,
Innovation Hall 136 INFS755 or equivalent knowledge. Some programming skills required for the final project. Textbook
(optional): Data Mining: Concepts and Techniques, 2^{nd}
Edition, Morgan Kaufmann Publishers, March 2006. ISBN
1558609016.
Course Description: Time series, or measurements taken over time in its traditional sense, is perhaps the most commonly encountered data type, encompassing almost every human endeavor including medicine, finance, aerospace, industry, science, etc. While time series data present special challenges to researchers due to its unique characteristics, the past decade has seen an explosion in time series data mining. This seminar provides an overview on state of the art research on mining temporal data. Topics covered include data representation, similarity search, clustering, classification, anomaly detection, and rule discovery. Sequential pattern discovery on discrete, temporal data (web logs, customer transactions, etc), and mining of streaming time series will also be discussed. Course Format:
Grading: Grading will be based on participation, a presentation, quizzes, and a final project. Each week you are required to read two papers, one of which will be presented by a student. You will be quizzed on both papers the following week. The presenting student will make up 2 simple quiz questions on the paper he or she presents. Participation/Attendance: 5% Quizzes: 20% Presentation: 25% Project Proposal: 15% Project: 35% Honor Code
Statement: Please be familiar with the GMU Honor Code. Any deviation from this is considered an Honor Code violation. All assignments (written and programming) for this class are individual unless otherwise specified. Tentative Schedule (TBA):
List of Papers (under construction): 1. Rakesh Agrawal, Christos Faloutsos and Arun Swami, Efficient Similarity Search In Sequence Databases
FODO conference, 2. Christos Faloutsos, M. Ranganathan and Yannis Manolopoulos Fast Subsequence
Matching in TimeSeries Databases Proc. ACM SIGMOD,
Minneapolis MN, May 2527, 1994, pp. 419429. 3. Chan, K. & Fu, A. W. (1999). Efficient time series matching by wavelets.
In proceedings of the 15th IEEE Int'l Conference on Data Engineering. 4. Keogh, E. and Kasetty, S. (2002). On the Need for Time Series Data Mining Benchmarks: A
Survey and Empirical Demonstration. In the 8^{th} ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining.
July 23  26, 2002. 5. Geurts, P. 2001. Pattern Extraction for Time Series Classification.
In Proceedings of the 5th European
Conference on Principles of Data Mining and Knowledge Discovery
(September 03  05, 2001). L. D. Raedt and A. Siebes, Eds. Lecture Notes In Computer Science, vol.
2168. SpringerVerlag, 6. Li Wei and Eamonn Keogh (2006) SemiSupervised Time Series Classification. SIGKDD 2006. 7. Gautam Das, KingIp Lin, Heikki Mannila, Gopal Renganathan, Padhraic Smyth: Rule Discovery from Time Series. KDD 1998: 1622. 8. Keogh, E., Lin,
J. & Truppel, W. (2003). Clustering of Time Series Subsequences is Meaningless:
Implications for Past and Future Research. In proceedings of the 3rd IEEE International
Conference on Data Mining (ICDM 2003). 9. Gavrilov, M., Anguelov, D., Indyk, P. & Motwani, R. (2000). Mining the stock market: which measure is best?
In proceedings of the 6th ACM Int'l Conference on Knowledge Discovery and
Data Mining. 10. Bagnall, A.J. and Janacek, G.J., Clustering time series from ARMA models with clipped data,
In proceedings of the 10th
International Conference on Knowledge Discovery in Data and Data Mining (ACM
SIGKDD 2004), Seattle, USA, pp. 4958, 2004 11. D. Dasgupta and S.
Forrest, "Novelty Detection in Time Series Data Using Ideas
from Immunology", Proceedings
of the 5th International Conference on Intelligent Systems, 12. Keogh, E., Lin, J.
& Fu, A. (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series
Subsequence. In the 5^{th} IEEE International Conference on Data Mining. 13. Chiu, B., Keogh, E., and Lonardi,
S. 2003. Probabilistic discovery of time series motifs. In Proceedings of the Ninth ACM SIGKDD
international Conference on Knowledge Discovery and Data Mining ( 14. Lin, J., Keogh, E., Lonardi,
S., Lankford, J. P., and Nystrom, D. M. 2004. Visually mining and monitoring massive time series.
In Proceedings of the Tenth ACM
SIGKDD international Conference on Knowledge Discovery and Data Mining
( 15. Vlachos, M., Meek, C., Vagena,
Z., and Gunopulos, D. 2004. Identifying similarities, periodicities and bursts for
online search queries. In Proceedings of the 2004 ACM SIGMOD international Conference on
Management of Data ( 16. Michail Vlachos, KunLung Wu, ShyhKwei Chen, Philip S.
Yu: Fast Burst Correlation of Financial Data.
PKDD 2005: 368379 17. Vlachos, M., Hadjieleftheriou,
M., Gunopulos, D., and Keogh, E. 2003. Indexing multidimensional timeseries with support for
multiple distance measures. In Proceedings of the Ninth ACM SIGKDD international Conference on
Knowledge Discovery and Data Mining ( 18. Cai, Y. and Ng, R.
2004. Indexing spatiotemporal
trajectories with Chebyshev polynomials.
In Proceedings of the 2004 ACM
SIGMOD international Conference on Management of Data ( 19. Agrawal, R. and Srikant, R. 1995. Mining Sequential Patterns. In Proceedings of the Eleventh international
Conference on Data Engineering (March 06  10, 1995). P. S. Yu and A.
L. Chen, Eds. ICDE. IEEE Computer Society, 20. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Discovery of frequent episodes in event sequences
. Data Mining and Knowledge
Discovery 1(3): 259  289, November 1997. 21. Gao, L., 22. Zhu, Y. and Shasha, D.
2003. Efficient elastic burst detection in data streams.
In Proceedings of the Ninth ACM
SIGKDD international Conference on Knowledge Discovery and Data Mining
( * (optional) Keogh, E. & Pazzani,M (1999). Relevance feedback retrieval of time series data. In Proceedings of the 22th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval. pp 183190.

