|
HOME
Instructor: Dr. Jessica Lin Office: Science
& Technology II, Room 453 Phone:
703-993-4693 Email:
jessica [AT] ise [DOT] gmu [DOT] edu Office Hours:
TBA Classes Thursdays
7:20-10:00pm Science & Technology II 15 Prerequisite: INFS 755 or equivalent. Some
programming skills required for the final project. Textbook (optional):
Data Mining:
Concepts and Techniques, 2nd Edition, Morgan Kauffmann Publishers,
March 2006. ISBN 1-55860-901-6. Course Description: Time series, or measurements taken
over time in its traditional sense, is perhaps the most commonly
encountered data type, encompassing almost every human endeavor
including medicine, finance, aerospace, industry, science, etc. While
time series data present special challenges to researchers due to its
unique characteristics, the past decade has seen an explosion in time
series data mining. This seminar provides an overview on state of the
art research on mining temporal data. Topics covered include data
representation, similarity search, clustering, classification, anomaly
detection, rule discoery, motif discovery, and visualization.
Sequential pattern discovery on discrete, temporal data (web logs,
customer transactions, etc). and mining of streaming time series will
also be discussed. Course Format: The course will include lectures by
the instructor, presentations from students, and class discussion. You
will be asked to read research papers published in major conferences
and/or journals (paper list TBA). Grading
Grading will be
based on participation, a presentation, quizzes (or paper summaries),
and a final project. Each week you are required to read two papers, one
of which will be presented by a student. Participation/Attendance:
5% Quizzes/Assignments: 20% Presentation:
25% Project Proposal: 15% Project:
35%
Schedule
Assigned papers
should be read prior to the class meeting (e.g. read papers #1 and #2
for the 9/13 class). The papers in bold face will be presented by the
instructor. Weeks | Dates | Topics | Papers | Presenter(s) | 1 | 8/30 | Introduction [slides] | | | 2 | 9/6 | Time
Series Similarity Search/Indexing I [slides] | 1, 2 | | 3 | 9/13 | Time Series Similarity
Search/Indexing II [slides] | 3,
4 | | 4 | 9/20 | Symbolic Representation [SAX] | 5 | | 5 | 9/27 | Classification [classification] | 6, 7 | Yun-Sheng Wang | 6 | 10/4 | Clustering/Rule Discovery [meaningless] | 8, 9 | Fei Qu | 7 | 10/11 | Anomaly Detection [HOT SAX] | 10, 11 | | 8 | 10/18 | Motif Discovery | 12, 13 | Joseph Jinn | 9 | 10/25 | Burst/Periodicity
Detection | 14,
15 | Yun-Sheng Wang | 10 | 11/1 | Trajectories | 18, 19 | Burt Wagner | 11 | 11/8 | Visualization | 16, 17 | Fei Qu | 12 | 11/15 | Sequential Pattern Mining | 20, 21 | Burt Wagner, Hong Chai | 13 | 11/22 | Thanksgiving - No Class | | | 14 | 11/29 | Streaming Time
Series/Data Streams | 22,
23 | Joseph Jinn | 15 | 12/6 | Project Presentation | | |
Paper List
1. Rakesh Agrawal,
Christos Faloutsos and Arun Swami, Efficient Similarity Search In Sequence
Databases FODO conference, Evanston,
Illinois,
Oct. 13-15, 1993. 2. Christos
Faloutsos, M. Ranganathan and Yannis Manolopoulos
Fast
Subsequence Matching in Time-Series Databases
Proc. ACM SIGMOD, Minneapolis MN, May 25-27, 1994, pp. 419-429
3. Chan, K. & Fu, A. W.
(1999). Efficient time series matching by
wavelets. In proceedings of the
15th IEEE Int'l Conference on Data Engineering. Sydney, Australia,
Mar 23-26. pp 126-133.
4. Keogh,
E. and Kasetty, S.
(2002). On the Need for Time Series Data Mining
Benchmarks: A Survey and Empirical Demonstration. In the 8th
ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining. July 23 - 26, 2002. Edmonton, Alberta,
Canada.
pp 102-111. 5. Lin,
J., Keogh, E., Li, W. & Lonardi,
S. (2007). Experiencing SAX: A Novel Symbolic
Representation of Time Series. Data Mining and Knowledge Discovery Journal.
To Appear. 6. Geurts,
P. 2001. Pattern Extraction for Time Series
Classification. In Proceedings of the 5th European Conference on
Principles of Data Mining and Knowledge Discovery
(September 03 - 05, 2001). L. D. Raedt
and A. Siebes, Eds.
Lecture Notes In Computer Science, vol. 2168. Springer-Verlag, London, 115-127.
7. Li
Wei and Eamonn Keogh (2006) Semi-Supervised Time Series
Classification. SIGKDD 2006.
8. Keogh, E., Lin, J.
& Truppel, W.
(2003). Clustering of Time Series Subsequences
is Meaningless: Implications for Past and Future Research.
In proceedings of the 3rd IEEE
International Conference on Data Mining (ICDM 2003). Melbourne,
FL. Nov 19-22. p.115-122.
9. Gavrilov, M., Anguelov, D., Indyk, P. & Motwani, R. (2000). Mining the stock market: which measure
is best? In proceedings of the 6th
ACM Int'l Conference on Knowledge Discovery and Data Mining. Boston,
MA,
Aug 20-23. pp 487-496. 10.
D.
Dasgupta and S. Forrest,
"Novelty Detection in Time Series Data
Using Ideas from Immunology", Proceedings
of the 5th International Conference on Intelligent Systems, Reno,
June, 1996. 11. Keogh, E., Lin, J. & Fu, A. (2005). HOT SAX: Efficiently Finding the Most
Unusual Time Series Subsequence.
In the 5th IEEE
International Conference on Data Mining. New Orleans,
LA.
Nov 27-30. 12. Chiu, B., Keogh, E., and Lonardi, S. 2003. Probabilistic discovery of time series
motifs. In Proceedings
of the Ninth ACM SIGKDD international Conference on Knowledge Discovery
and Data Mining (Washington, D.C., August
24 - 27, 2003). KDD '03. ACM Press, New York, NY,
493-498. 13.
Minnen, D., Essa, I., Isbell, C. L. & Starner, T. 2007.
Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized
Multivariate Pattern Discovery. In Proceedings of the 2007 IEEE
International Conference on Data Mining. Omaha, NE. Oct 28-31. To
Appear. 14. Vlachos, M., Meek, C., Vagena, Z., and Gunopulos, D. 2004. Identifying similarities, periodicities
and bursts for online search queries.
In Proceedings of the 2004 ACM SIGMOD
international Conference on Management of Data (Paris,
France,
June 13 - 18, 2004). SIGMOD '04. ACM Press, New York, NY,
131-142. 15. Michail
Vlachos, Kun-Lung Wu, Shyh-Kwei Chen, Philip S. Yu: Fast Burst Correlation of Financial Data.
PKDD 2005: 368-379 16. Lin, J., Keogh, E., Lonardi, S., Lankford, J. P.,
and Nystrom, D. M.
2004. Visually mining and monitoring massive
time series. In Proceedings
of the Tenth ACM SIGKDD international Conference on Knowledge Discovery
and Data Mining (Seattle, WA,
USA,
August 22 - 25, 2004). KDD '04. ACM Press, New York, NY,
460-469. 17. Hochheiser, H. &
Shneiderman, B. 2004. Dynamic
Query Tools for Time Series Data Sets, Timebox Wedges for Interactive
Exploration. Information Visualization Journal 3, 1. pp. 1-18.
18. Anagnostopoulos, A., Vlachos,
M., Hadjieleftheriou, M., Keogh, E. & Yu, P. S. 2006. Global
Distance-Based Segmentation of Trajectories. In Proceedings
of the 12th ACM KDD International Conference on Knowledge Discovery and
Data Mining. Philadelphia, PA. Aug 20-23. 19. Giannotti,
F., Nanni, M., Pinelli, F. & Pedresch, D. 2007. Trajectory
Pattern Mining. In Proceedings of the 13th ACM
SIGKDD international Conference on Knowledge Discovery and Data Mining
(San Jose, CA,
USA,
August 12 - 15, 2007). 20. Agrawal, R. and Srikant, R. 1995. Mining Sequential Patterns.
In Proceedings of the Eleventh
international Conference on Data Engineering (March 06 -
10, 1995). P. S. Yu and A. L. Chen, Eds. ICDE. IEEE Computer Society, Washington,
DC,
3-14. 21. Heikki Mannila,
Hannu Toivonen, and A. Inkeri Verkamo.
Discovery of frequent episodes in event
sequences . Data
Mining and Knowledge Discovery 1(3): 259 - 289, November
1997. 22. Gao, L., Yao,
Z., and Wang, X. S. 2002. Evaluating continuous nearest neighbor
queries for streaming time series via pre-fetching.
In Proceedings of the Eleventh
international Conference on information and Knowledge Management
(McLean,
Virginia, USA,
November 04 - 09, 2002). CIKM '02. ACM Press, New York, NY,
485-492. 23. Zhu, Y. and Shasha, D. 2003. Efficient elastic burst detection in
data streams. In Proceedings
of the Ninth ACM SIGKDD international Conference on Knowledge Discovery
and Data Mining (Washington, D.C., August
24 - 27, 2003). KDD '03. ACM Press, New York, NY,
336-345.
| |