- When: Friday, October 07, 2022 from 01:00 PM to 02:00 PM
- Speakers: Srinivasan Parthasarathy
- Location: ENGR 4201 (CS Conference Room)
- Export to iCal
Abstract: Recently there has been a surge of interest in designing graph embedding methods. Few, if any, can scale to a large-sized graph with millions of nodes due to both computational complexity and memory requirements. In this talk, I will present an approach to redress this limitation by introducing the MultI-Level Embedding (MILE) framework – a generic methodology allowing con-temporary graph embedding methods to scale to large graphs. MILE repeatedly coarsens the graph into smaller ones using a hybrid matching technique to maintain the backbone structure of the graph. It then applies existing embedding methods on the coarsest graph and refines the embeddings to the original graph through a graph convolution neural network that it learns. Time permitting, I will then describe one of several natural extensions to MILE - in a distributed setting (DistMILE) to further improve the scalability of graph embedding or mechanisms - to learn fair graph representations (Fair!
MILE).
The proposed MILE framework and variants (DistMILE, FairMILE), are agnostic to the underlying graph embedding techniques and can be applied to many existing graph embedding methods without modifying them and is agnostic to their implementation language. Experimental results on five large-scale datasets demonstrate that MILE significantly boosts the speed (order of magnitude) of graph embedding while generating embeddings of better quality, for the task of node classification. MILE can comfortably scale to a graph with 9 million nodes and 40 million edges, on which existing methods run out of memory or take too long to compute on a modern workstation. Our experiments demonstrate that DistMILE learns representations of similar quality with respect to other baselines while reducing the time of learning embeddings even further (up to 40 x speedup over MILE). FairMILE similarly learns fair representations of the data while reducing the time of learning embeddings.
Joint work with Jionqian Liang (Google Brain), S. Gurukar (OSU) and Yuntian He (OSU)
Bio: Srinivasan Parthasarathy is a Professor of Computer Science and Engineering and the director of the data mining research laboratory at Ohio State. His research interests span data analytics, databases and high-performance computing. He is among a handful of researchers nationwide to have won both the Department of Energy and National Science Foundation Career awards. He and his students have won multiple best paper awards or "best of" nominations from leading forums in the field including: SIAM Data Mining, ACM SIGKDD, VLDB, ISMB, WWW, ICDM, and ACM Bioinformatics. He chaired the SIAM data mining conference steering committee (elected) from 2012 till 2019, and has served on the board of several journals in parallel computing, machine learning and data mining. Since 2012 he also helped lead the creation of OSU's first-of-a-kind nationwide (US) undergraduate major in data analytics and serves as one of its founding directors.