INFS 795 / IT 803

Special Topics in Data Mining Applications:

Data Mining on Multimedia and High-Dimensional Data

 

 

Instructor:


Dr. Jessica Lin
Office: Science & Technology II, Room 453
Phone: 703-993-4693
Email:  jessica@ise.gmu.edu

 

Lectures:

 

Monday 7:20-10:00pm, Innovation Hall 208

Prerequisite:

 

INFS-755 or equivalent knowledge. Some programming skills required for the final project.

 

Course Description:

 

The vast growth of disk technology in the past decade has enabled generation and storage of large multimedia datasets. Such data, including audio, video, texts, etc., is ubiquitous and can be found in diverse domains. Their massive size and high dimensionality pose great challenges for researchers and practitioners. In addition, the unique characteristics associated with each data type imply that specialized solutions are needed. This seminar provides an overview on state of the art research on mining multimedia and high-dimensional data, and discusses issues related to handling such data types including feature extraction, high dimensional indexing, interactive search and information retrieval, pattern discovery, and scalability to large datasets. Mining techniques and data types to be covered include the followings:

 

        Images

        Video sequences/surveillance

        Texts/Web mining

        Time series

        DNA data

        Spatial/Temporal/Spatial-temporal data

 

Course Format:


The course will include lectures by the instructor, presentations from students, and class discussion. You will be asked to read research papers published in major conferences and/or journals (paper list TBA).

 

Grading:

 

Grading will be based on participation, a presentation, quizzes, and a final project. Each week you are required to read two papers, one of which will be presented by a student. You will be quizzed on both papers the following week. The presenting student will make up 2 simple quiz questions on the paper he or she presents.

 

Participation/Attendance: 15%

Quizzes: 15%

Presentation: 25%

Project Proposal: 10%

Project: 35%

 

Schedule:

 

 

 

Dates

Topics

Papers

Presenter

1

Jan 22

Introduction I

 

 

2

Jan 29

Introduction II

1, 2

 

3

Feb 5

Text/Web Mining I

3, 4

 

4

Feb 12

Text/Web Mining II

5, 6

Steven Vincent

5

Feb 19

Text/Web Mining III

7, 8

Marcos Vieira

6

Feb 26

Time Series I

9, 10

 

7

Mar 5

Time Series II

11, 12

 

8

Mar 12

Spring Break (No Class)

 

 

9

Mar 22

Audio

13, 14, 15

Raimi Rufai

10

Mar 26

Images I

16, 17

Puttikan Prapai

11

Apr 2

Images II

18, 19

Joseph Jinn

12

Apr 9

Video

20, 21

David Etter

13

Apr 16

DNA

22, 23

 

14

Apr 23

Spatio-Temporal

24, 25

Indar Bhatia

15

Apr 30

Data Streams

26, 27

 

16

May 7

Project Presentations

 

 

 

Paper List (TBA):

 

Week

Topic

Paper

2

Intro

1. Beyer, K. S., Goldstein, J., Ramakrishnan, R., and Shaft, U. 1999. When Is ''Nearest Neighbor Meaningful?. In Proceeding of the 7th international Conference on Database theory. Jan 10-12, 1999.

2

Intro

2. Faloutsos, C. and Lin, K. 1995. FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of the 1995 ACM SIGMOD international Conference on Management of Data (San Jose, California, United States, May 22 - 25, 1995).

3

Text I

3. Hearst, M. A. 1999. Untangling text data mining. In Proceedings of the 37th Annual Meeting of the Association For Computational Linguistics on Computational Linguistics (College Park, Maryland, June 20 - 26, 1999). Annual Meeting of the ACL.

3

Text I

4. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391-407.

4

Text II

5. Bingham, E. and Mannila, H. 2001. Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the Seventh ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (San Francisco, California, August 26 - 29, 2001). KDD '01.

4

Text II

6. Yang, Y., Pedersen, J.O., A Comparative Study on Feature Selection in Text Categorization, Proc. of the 14th International Conference on Machine Learning ICML97, pp. 412---420, 1997.

5

Text III

7. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the Seventh international Conference on World Wide Web 7 (Brisbane, Australia). P. H. Enslow and A. Ellis, Eds. Elsevier Science Publishers B. V., Amsterdam, The Netherlands

5

Text III

8. F. Radlinski and T. Joachims, Query Chains: Learning to Rank from Implicit Feedback, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2005.

6

Time Series I

9. R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In Proc. of the Fourth Int'l Conference on Foundations of Data Organization and Algorithms, Chicago, October 1993.

6

Time Series I

10. Lin, J., Keogh, E., Li, W. & Lonardi, S. (2007). Experiencing SAX: A Novel Symbolic Representation of Time Series. Data Mining and Knowledge Discovery Journal. To Appear.

7

Time Series II

11. Sripada, S. G., Reiter, E., Hunter, J., and Yu, J. 2003. Generating English summaries of time series data using the Gricean maxims. In Proceedings of the Ninth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Washington, D.C., August 24 - 27, 2003). KDD '03.

7

Time Series II

12.Keogh, E., Lin, J. & Truppel, W. (2003). Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research. In proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003). Melbourne, FL. Nov 19-22. p.115-122.   

8

Audio

13. Matt Welsh, Nikita Borisov, Jason Hill, Rob von Behren, and Alec Woo. Querying large collections of music for similarity. Technical Report UCB/CSD00 -1096, U.C. Berkeley Computer Science Division. 1999.

8

Audio

14. Berenzweig, A., Logan, B., Ellis, D., Whitman, B.: A Large-Scale Evaluation of Acoustic and Subjective Music Similarity Measures. In: Proc. of the 4th International Symposium on Music Information Retrieval. 2003.

8

Audio

15. J. Haitsma and T. Kalker. A Highly Robust Audio Fingerprinting System. In proceedings of the 3rd International Conference on Music Information Retrieval. Paris, France. Oct 13-17, 2002.

10

Image I

16. Christos Faloutsos, Ron Barber, Myron Flickner, Wayne Niblack, Dragutin Petkovic, and William Equitz. Efficient and effective querying by image content. J. of Intelligent Information Systems, 3(3/4):231-- 262, July 1994

10

Image I

17. Yong Rui, Thomas S. Huang, and Shih-Fu Chang. Image retrieval: current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, Vol. 10, no. 4, pp. 39-62. 1999.

11

Image II

18. Charles Jacobs, Adam Finkelstein, David Salesin. Fast Multiresolution Image Querying. Computer Graphics, Annual Conference Series (Siggraph'95 Proceedings), pp. 277-286

11

Image II

19. Mori, G., Belongie, S., Malik, H. Shape contexts enable efficient retrieval of similar shapes. In proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauuai Marriott, Hawaii. Dec 9-14, 2001.

12

Video

20. J.-Y. Chen, C. Taskiran, A. Albiol, C. A. Bouman, and E. J. Delp. Vibe: A video indexing and browsing environment. Proceedings of the SPIE Conference on Multimedia Storage and Archiving Systems IV, vol. 3846, September 1999, Boston, MA, pp. 148--164.

12

Video

21. Hualu Wang, Ajay Divakaran, Anthony Vetro, Shih-Fu Chang, Huifang Sun. Survey of Compressed-Domain Features Used in Audio-Visual Indexing and Analysis. Journal of Visual Communication and Image Representation, 14(2):150-183, June 2003.

13

DNA

22. J. Buhler and M. Tompa. Finding Motifs Using Random Projections. In RECOMB'01, pages 69--76. ACM-, 2001. Proc.RECOMB'01, Montreal.

13

DNA

23. Y. Cheng and G.M. Church. Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pages 93-103, 2000.

14

Spatio-Temporal

24. Cao, H., Mamoulis, N., and Cheung, D. W. 2005. Mining Frequent Spatio-Temporal Sequential Patterns. In Proceedings of the Fifth IEEE international Conference on Data Mining (November 27 - 30, 2005).

14

Spatio-Temporal

25. P. Kalnis, N. Mamoulis, and S. Bakiras. On Discovering Moving Clusters in Spatio-temporal Data. In Proc. of 9th Int. Symposium on Advances in Spatial and Temporal Databases (SSTD'2005), number 3633 in LNCS, pages 364--381, Angra dos Reis, Brazil, Aug. 2005. Springer.

15

Data Streams

26. Gaber, M. M., Zaslavsky, A., and Krishnaswamy, S. 2005. Mining data streams: a review. SIGMOD Rec. 34, 2 (Jun. 2005), 18-26.

15

Data Streams

27. Aggarwal, C. C., Han, J., Wang, J., and Yu, P. S. 2004. On demand classification of data streams. In Proceedings of the Tenth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Seattle, WA, USA, August 22 - 25, 2004). KDD '04.