•   When: Thursday, October 24, 2024 from 11:00 AM to 12:00 PM
  •   Speakers: Xiang Yue
  •   Location: Nguyen Engineering Bldg., Room 4801
  •   Export to iCal

Abstract: Large language models (LLMs) have made significant strides, yet challenges remain in advancing their reasoning capabilities, particularly in expert-level and multimodal contexts. In this talk, I will present our research on evaluating and improving LLM reasoning. First, I will introduce the MMMU and MMMU-Pro benchmarks, designed to rigorously evaluate multimodal reasoning across a wide range of disciplines and subjects. These benchmarks expose key limitations in current models’ reasoning abilities and highlight critical paths for improvement. Next, I will discuss the MAmmoTH series, a set of methods aimed at enhancing reasoning within LLMs. MAmmoTH-1 focuses on Hybrid of Thoughts, combining programming and natural language approaches to teach models more flexible reasoning. MAmmoTH-2 scales up high-quality synthetic reasoning data from the web to enhance the depth and breadth of reasoning abilities. Finally, I will briefly talk about our future efforts for building and measuring the next-generation intelligent AI systems.

 

Brief bio: Xiang Yue (https://xiangyue9607.github.io/index.html) is a Postdoctoral Fellow at Carnegie Mellon University. He received his PhD from The Ohio State University in 2023. His research focuses on understanding and enhancing the reasoning capabilities of large language models. He has been awarded a postdoctoral fellowship from the Carnegie Bosch Institute, a Generative AI rising star from University of Massachusetts Amherst, a Best Paper Finalist at CVPR 2024, a Best Paper Honorable Mention at ACL 2023, a Best Paper Award at IEEE BIBM 2021, and two prestigious research awards from The Ohio State University. Xiang's recent work on developing the MMMU evaluation benchmark has garnered attention beyond academia, being featured in the releases of OpenAI o1/GPT-4o, Google Gemini, and Anthropic Claude. 

Posted 1 month, 2 weeks ago