•   When: Monday, February 03, 2020 from 11:00 AM to 12:00 PM
  •   Speakers: Grigory Yaroslavtsev
  •   Location: Engineering Building 4201
  •   Export to iCal

Abstract: “Compared to the highly successful flat clustering methods (e.g. k-means), despite its important role in data analysis, hierarchical clustering has been lacking in rigorous algorithmic studies until late due to absence of rigorous objectives. Since 2016, a sequence of works has emerged and gave novel algorithms for this problem. This was enabled by a breakthrough by Dasgupta, who introduced a formal optimization objective into the study of hierarchical clustering.

In this talk I will give an overview of our recent progress on models and scalable algorithms for hierarchical clustering applicable to both arbitrary and high-dimensional vector data, including embedding vectors arising from deep learning. I will first discuss various linkage-based algorithms (single-linkage, average-linkage) and their formal properties with respect to various objectives. I will then introduce 1) a new projection-based approximation algorithm for vector data, 2) a new partitioning-based algorithm for arbitrary data. I will also discuss scalable implementations using projected gradient descent and experimental results on large-scale vector embedding datasets from deep learning and other methods. The talk will be self-contained and doesn’t assume prior knowledge of clustering methods.”

Bio: "Grigory Yaroslavtsev (http://grigory.us) is an assistant professor of Computer Science at Indiana University and an adjunct assistant professor of Statistics (by courtesy). He is the founding director of the Center for Algorithms and Machine Learning at IU (http://caml.indiana.edu/). Previously Grigory held a visiting position at the Alan Turing Institute (London, UK) and postdoctoral fellowships at the Warren Center for Network and Data Sciences at the University of Pennsylvania and at Brown University, ICERM. Grigory received his Ph.D. in theoretical computer science in 2014 from Penn State. He works on foundational questions in scalable algorithms for machine learning, data science and private data release. His work is supported by NSF CRII Award and Facebook Faculty Research Award."

Posted 4 years, 4 months ago