GRAND seminar
12:00 noon, Sep 13, Thursday, 2007, by Jessica Lin
ST2, 430A

Discovering Unusual Patterns in Massive Time Series Data

Abstract

Time series is perhaps the most commonly encountered data type touching almost every aspect of human life, including medicine, finance, aerospace, entertainment, etc. Apart from the obvious problem of handling the typically massive size of time series databases (gigabytes, or even terabytes in size is not uncommon), most classic machine learning and data mining algorithms do not work well for time series due to their unique structure. In particular, the high dimensionality, very high feature correlation, and the (typically) large amount of noise that characterize time series data present a difficult challenge. The previous body of work in time series data mining has been mostly concentrated on the identification of previously known patterns (i.e. query-by-content). The emphasis of this talk is on the discovery of interesting, unknown patterns in massive time series data, and how these patterns can be identified using a symbolic representation of time series we call SAX (Symbolic Aggregate approXimation).

Biography

Jessica Lin is an Assistant Professor in the Department of Information and Software Engineering/Computer Science at George Mason University. She received her Ph.D degree from University of California, Riverside in June, 2005. Her research interests are in data mining, databases, and machine learning. In the past few years, she has worked on a variety of data mining problems, including indexing, classification, clustering, motif discovery, anomaly detection, and visualization, on different data types such as time series, images, and texts. More specifically, her work focuses on efficiently manipulating, indexing, and discovering unusual patterns in large temporal datasets.




Department of Computer Science
Volgenau School of Information Technology and Engineering
George Mason University