|
INFS 795 / IT 803 Special Topics in Data Mining Applications: Data
Mining on Multimedia and High-Dimensional Data Instructor:
Lectures: Monday 7:20-10:00pm, Innovation
Hall 208 INFS-755 or equivalent knowledge. Some programming skills required for the final project. Course Description: The vast growth of disk technology in the past decade has enabled generation and storage of large multimedia datasets. Such data, including audio, video, texts, etc., is ubiquitous and can be found in diverse domains. Their massive size and high dimensionality pose great challenges for researchers and practitioners. In addition, the unique characteristics associated with each data type imply that specialized solutions are needed. This seminar provides an overview on state of the art research on mining multimedia and high-dimensional data, and discusses issues related to handling such data types including feature extraction, high dimensional indexing, interactive search and information retrieval, pattern discovery, and scalability to large datasets. Mining techniques and data types to be covered include the followings: · Images · Video sequences/surveillance · Texts/Web mining · Time series · DNA data · Spatial/Temporal/Spatial-temporal data Course Format:
Grading: Grading will be based on participation, a presentation, quizzes, and a final project. Each week you are required to read two papers, one of which will be presented by a student. You will be quizzed on both papers the following week. The presenting student will make up 2 simple quiz questions on the paper he or she presents. Participation/Attendance: 15% Quizzes: 15% Presentation: 25% Project Proposal: 10% Project: 35% Schedule: |
|
|
Dates |
Topics |
Papers |
Presenter |
1 |
Jan
22 |
Introduction I |
|
|
2 |
Jan
29 |
Introduction II |
1, 2 |
|
3 |
Feb
5 |
Text/Web Mining I |
3, 4 |
|
4 |
Feb
12 |
Text/Web Mining II |
5, 6 |
Steven Vincent |
5 |
Feb
19 |
Text/Web Mining
III |
7, 8 |
Marcos Vieira |
6 |
Feb
26 |
Time Series I |
9, 10 |
|
7 |
Mar 5 |
Time Series II |
11, 12 |
|
8 |
Mar
12 |
Spring Break (No Class) |
|
|
9 |
Mar
22 |
Audio |
13, 14, 15 |
Raimi Rufai |
10 |
Mar
26 |
Images I |
16, 17 |
Puttikan Prapai |
11 |
Apr
2 |
Images II |
18, 19 |
Joseph Jinn |
12 |
Apr
9 |
Video |
20, 21 |
David Etter |
13 |
Apr
16 |
DNA |
22, 23 |
|
14 |
Apr
23 |
Spatio-Temporal |
24, 25 |
Indar Bhatia |
15
|
Apr
30 |
Data Streams |
26, 27 |
|
16 |
May 7 |
Project
Presentations |
|
|
Paper List (TBA):
Week |
Topic |
Paper |
2 |
Intro |
1. Beyer, K. S., Goldstein, J., Ramakrishnan, R., and Shaft, U. 1999. When Is ''Nearest Neighbor Meaningful?. In Proceeding of the 7th international Conference on Database theory. Jan 10-12, 1999. |
2 |
Intro |
2. Faloutsos, C. and Lin, K. 1995. FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of the 1995 ACM SIGMOD international Conference on Management of Data (San Jose, California, United States, May 22 - 25, 1995). |
3 |
Text I |
3. Hearst, M. A. 1999. Untangling text data mining. In Proceedings of the 37th Annual Meeting of the Association For Computational Linguistics on Computational Linguistics (College Park, Maryland, June 20 - 26, 1999). Annual Meeting of the ACL. |
3 |
Text I |
4. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391-407. |
4 |
Text II |
5. Bingham, E. and Mannila, H.
2001. Random projection in dimensionality reduction:
applications to image and text data. In Proceedings of
the Seventh ACM SIGKDD international Conference on Knowledge Discovery and
Data Mining ( |
4 |
Text II |
6. Yang, Y., Pedersen, J.O., A Comparative Study on Feature Selection in Text Categorization, Proc. of the 14th International Conference on Machine Learning ICML97, pp. 412---420, 1997. |
5 |
Text III |
7. Brin, S. and Page, L. 1998.
The anatomy of a large-scale hypertextual Web search
engine. In Proceedings of the Seventh international
Conference on World Wide Web 7 ( |
5 |
Text III |
8. F. Radlinski and T. Joachims, Query Chains: Learning to Rank from Implicit Feedback, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2005. |
6 |
Time Series I |
9. R. Agrawal,
C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases.
In Proc. of the Fourth Int'l Conference on Foundations of Data Organization
and Algorithms, |
6 |
Time Series I |
10. Lin, J., Keogh, E., Li, W. & Lonardi, S. (2007). Experiencing SAX: A Novel Symbolic Representation of Time Series. Data Mining and Knowledge Discovery Journal. To Appear. |
7 |
Time Series II |
11. Sripada, S. G., Reiter,
E., Hunter, J., and Yu, J. 2003. Generating English summaries of time series data using the
Gricean maxims. In Proceedings of the Ninth ACM SIGKDD
international Conference on Knowledge Discovery and Data Mining ( |
7 |
Time Series II |
12.Keogh, E., Lin, J. & Truppel, W.
(2003). Clustering of Time Series Subsequences is Meaningless:
Implications for Past and Future Research. In proceedings
of the 3rd IEEE International Conference on Data Mining (ICDM 2003).
|
8 |
Audio |
13. Matt Welsh, Nikita Borisov,
Jason Hill, Rob von Behren, and Alec Woo. Querying large collections of music for similarity.
Technical Report UCB/CSD00 -1096,
U.C. Berkeley Computer Science Division. 1999. |
8 |
Audio |
14. Berenzweig, A., Logan, B., Ellis, D., Whitman, B.: A Large-Scale Evaluation of Acoustic
and Subjective Music Similarity Measures. In: Proc. of the 4th International Symposium
on Music Information Retrieval. 2003. |
8 |
Audio |
15. J. Haitsma
and T. Kalker. A Highly Robust Audio Fingerprinting System.
In proceedings of the 3rd
International Conference on Music Information Retrieval. |
10 |
Image I |
16. Christos Faloutsos, Ron Barber, Myron Flickner, Wayne Niblack, Dragutin Petkovic, and William Equitz. Efficient and effective querying by image content. J. of Intelligent Information Systems, 3(3/4):231-- 262, July 1994 |
10 |
Image I |
17. Yong Rui, Thomas S. Huang, and Shih-Fu Chang. Image retrieval: current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, Vol. 10, no. 4, pp. 39-62. 1999. |
11 |
Image II |
18. Charles Jacobs, Adam Finkelstein, David Salesin. Fast Multiresolution Image Querying. Computer Graphics, Annual Conference Series (Siggraph'95 Proceedings), pp. 277-286 |
11 |
Image II |
19. Mori, G., Belongie, S., Malik, H. Shape contexts enable
efficient retrieval of similar shapes. In proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition. |
12 |
Video |
20. J.-Y. Chen, C. Taskiran,
A. Albiol, C. A. Bouman,
and E. J. Delp. Vibe: A video indexing and browsing environment.
Proceedings of the SPIE Conference on
Multimedia Storage and Archiving Systems IV, vol. 3846, September 1999,
Boston, MA, pp. 148--164. |
12 |
Video |
21. . Survey of Compressed-Domain Features Used in Audio-Visual
Indexing and Analysis. Journal of Visual Communication and Image
Representation, 14(2):150-183, June 2003. |
13 |
DNA |
22. J. Buhler and M. Tompa.
Finding Motifs Using Random Projections.
In RECOMB'01, pages 69--76. ACM-,
2001. Proc.RECOMB'01, |
13 |
DNA |
23. Y. Cheng and |
14 |
Spatio-Temporal |
24. Cao,
H., Mamoulis, N., and Cheung, D. W. 2005. Mining Frequent Spatio-Temporal Sequential Patterns.
In Proceedings of the Fifth IEEE international Conference on Data Mining
(November 27 - 30, 2005). |
14 |
Spatio-Temporal |
25. P. Kalnis, N. Mamoulis, and S. Bakiras. On Discovering Moving Clusters in Spatio-temporal Data.
In Proc. of 9th Int. Symposium on
Advances in Spatial and Temporal Databases (SSTD'2005), number 3633 in
LNCS, pages 364--381, Angra dos Reis, Brazil, Aug.
2005. Springer. |
15 |
Data Streams |
26. Gaber, M. M., Zaslavsky, A., and Krishnaswamy, S. 2005. Mining data streams: a review. SIGMOD Rec. 34, 2 (Jun. 2005), 18-26. |
15 |
Data Streams |
27. Aggarwal, C. C., Han, J.,
Wang, J., and Yu, P. S. 2004. On demand classification of data streams.
In Proceedings of the Tenth ACM SIGKDD international Conference on
Knowledge Discovery and Data Mining ( |