Probabilistic Topic Modeling for Text Data

12:00 noon, Feb 12, Tuesday, 2008, by Loulwah AlSumait, ST2, 430A

Abstract

A probabilistic topic model is a statistical generative model that can be used to automatically extract a representation of documents as a collection of topics based on the Latent Dirichlet Allocation (LDA) model. LDA is a hierarchical Bayesian network that relates words and documents through latent topics. The underlying basic idea is that documents are composed by a mixture of topics, where topics are distributions over words. The generative process of the topic model specifies a probabilistic sampling procedure that describes how words in documents can be generated based on the hidden topics. In this talk, I will give a brief overview of probabilistic topic models and their application to text mining. I will then introduce an extension of LDA for on-line topic modeling of text streams.

Short Bio

Loulwah AlSumait received a BS and MS in Computer Science from Kuwait University, Kuwait, in 1995 and 1999, respectively. She is now pursuing a PhD in Computer Science from George Mason University. Her research interests include data mining and pattern recognition with applications in text mining.