- When: Monday, February 28, 2022 from 02:00 PM to 03:00 PM
- Speakers: Yapeng Tian
- Location: ZOOM only
- Export to iCal
Abstract:
Understanding surrounding scenes, i.e., recognizing objects, sounds, and human activities, is a fundamental capability in human intelligence. Similarly, developing computational models that can understand scenes is a central problem in AI. Humans use multiple cooperated senses with multisensory integration to understand a scene. For example, hearing helps capture the spatial location of a racing car behind us; seeing peoples' talking faces can strengthen our perception of their speech. However, most existing scene understanding algorithms are designed to solely rely on either visual or auditory modalities, and they are yet to fully explore whether joint audio-visual learning can facilitate understanding scenes in videos. In this talk, I will introduce a series of our works in audio-visual scene understanding for building machines with unified, explainable, and robust multisensory perception capability. At the end of the talk, I will discuss some challenges and future directions.
Bio:
Yapeng Tian is a Ph.D. candidate in the Department of Computer Science at the University of Rochester. He received his master's degree from Tsinghua University in 2017 and a bachelor's degree from Xi'dian University in 2013. His research interests center around solving core computer vision and computer audition problems and applying the developed learning approaches to broad AI applications in multisensory perception, computational photography, AR/VR, and HCI. He has published more than 20 papers in peer-reviewed conferences and journals, including CVPR, ICCV, ECCV, and IEEE TPAMI. His work has pioneered advancing research in computational multisensory perception and low-level vision. He has organized two "audio-visual scene understanding" tutorials at WACV 2021 and CVPR 2021.
Posted 2 years, 9 months ago