•   When: Friday, September 26, 2014 from 11:00 AM to 12:00 PM
  •   Speakers: David Blei
  •   Location: Research Hall, Room 163
  •   Export to iCal

Abstract

Probabilistic topic models provide a suite of tools for analyzing large document collections. Topic modeling algorithms discover the latent themes that underlie the documents and identify how each document exhibits those themes. Topic modeling can be used to help explore, summarize, and form predictions about documents. Topic modeling ideas have been adapted to many domains, including images, music, networks, genomics, and neuroscience.

Traditional topic modeling algorithms analyze a document collection and estimate its latent thematic structure. However, many collections contain an additional type of data: how people use the documents. For example, readers click on articles in a newspaper website, scientists place articles in their personal libraries, and lawmakers vote on a collection of bills. Behavior data is essential both for making predictions about users and for understanding how a collection is organized.

I will review the basics of topic modeling and describe our recent research on collaborative topic models, models that simultaneously analyze a collection of texts and its corresponding user behavior. > We studied collaborative topic models on scientists' libraries (from Mendeley) and scientists' click data (from the arXiv). Collaborative topic models enable interpretable recommendation systems, capturing scientists' preferences and pointing them to articles of interest. Further, they help organize the articles according to the discovered patterns of readership. For example, we can identify articles that are important within a field, and articles that transcend disciplinary boundaries.

More broadly, topic modeling is a case study in the large field of applied probabilistic modeling. I will survey some recent advances in this field. I will show how modern probabilistic modeling gives data scientists a rich language for expressing statistical assumptions and scalable algorithms for uncovering hidden patterns in massive data.

Speaker's Biography

David Blei is a Professor of Statistics and Computer Science at Columbia University. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data.

David earned his Bachelor's degree in Computer Science and Mathematics from Brown University (1997) and his PhD in Computer Science from the University of California, Berkeley (2004). Before arriving to Columbia, he was an Associate Professor of Computer Science at Princeton University. He has received several awards for his research, including a Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), and ACM-Infosys Foundation Award (2013).

Posted 3 years, 2 months ago