Towards Efficient Deep Neural Network Execution with Model Compression and Platform-specific Optimization | George Mason Department of Computer Science

When: Monday, March 14, 2022 from 11:00 AM to 12:00 PM
Speakers: Xiaolong Ma
Location: ZOOM only
Export to iCal

Abstract: Deep learning or deep neural network (DNN), as one of the most powerful machine learning techniques, has become the fundamental element and core enabler of the artificial intelligence. Many incredible, bleeding-edge applications, such as community/shared virtual reality experiences and self-driving cars, will crucially rely on the ubiquitous availability and real-time executability of the high-quality deep learning models. Among the variety of the AI-associated platforms, mobile and embedded computing devices have become key carriers of deep learning to facilitate the widespread of machine intelligence. In this talk, I will first focus on a compression-compilation co-design method that deploy a unique sparse model on an off-the-shelf mobile device with real-time execution speed. This method advances the state-of-the-art by introducing a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in the design space. The designed patterns are interpretable, and can be obtained by a fully automatic pattern-aware pruning framework that achieves pattern library extraction, pattern selection, pattern assignment (pruning) and weight training simultaneously. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency. We take a step forward by considering a more practical scenario, that the deployment-execution mode for AI tasks no longer satisfy the user preference, and enabling edge training becomes inevitable since it promotes much better personalized intelligent services while strengthen users’ privacy by avoiding data egress from their devices. To this end, I will demonstrate my approaches that use sparsity to achieve fast and efficient training on the edge devices. I will demonstrate a high-accuracy and low-cost dynamic sparse training framework that makes the edge training possible. It successfully incorporates the pattern-based sparsity into sparse training, and also exploit the data-level sparsity to further improve the acceleration. I will conclude by using our sparse training method on a distributed training scenario, which demonstrates the state-of-the-art accuracy and great flexibility for modern AI model training.

Bio: Xiaolong Ma is a Ph.D. candidate in the Department of Electrical and Computer Engineering at Northeastern University. He was the winner of the Contributed Article of CACM in 2021, for his contribution on the research of model compression for real-time inference on mobile devices. His highly efficient dynamic sparse training framework won the Best Paper Award in ICLR workshop of Hardware Aware Efficient Training (HAET), and also received the Spotlight Paper Award in NeurIPS in 2021. His work on efficient machine learning with stochastic number generator was nominated for the Best Paper Award in ISQED 2017. He has published in the top conference ranging from NeurIPS, ICML, ICLR, ECCV, AAAI, IJCAI, ASPLOS, ISCA, MICRO, DAC, ICS, PACT, and top journal such as TPAMI, TNNLS and CACM.

Posted 2 years, 2 months ago

Towards Efficient Deep Neural Network Execution with Model Compression and Platform-specific Optimization Events / CS Seminar

Categories

Towards Efficient Deep Neural Network Execution with Model Compression and Platform-specific Optimization
Events / CS Seminar