Challenges in AI-assisted AI evaluation | George Mason Department of Computer Science

When: Thursday, November 07, 2024 from 11:00 AM to 12:00 PM
Speakers: Shi Feng
Location: Nguyen Engineering Bldg., Room 4801
Export to iCal

Abstract:
AIs are being deployed to solve increasingly complex problems, and reliable human oversight becomes a huge challenge: AIs are also getting better at producing outputs that look correct to humans but are in fact subtly flawed. To support effective oversight, approaches like debate, constitutional AI, and reward modeling all involve using AIs to assist human evaluators. Although promising, these approaches can create new risks, as AIs are being used to evaluate themselves. In this talk, I will discuss three failure modes in both AI-assisted evaluation and training with flawed supervision. I will also discuss preliminary work on mitigating these risks.

Bio:
Shi (https://ihsgnef.github.io/) is an assistant professor at the George Washington University. He received his PhD at UMD supervised by Jordan Boyd-Graber. He did postdocs at UChicago with Chenhao Tan and NYU in the Alignment Research Group with Sam Bowman and He He. Shi works on AI safety, in particular scalable oversight, as an extension of his work on human-AI collaboration, interpretability eval, and adversarial robustness. His most recent work focuses on a meta-evaluation of risks in scalable oversight methods and evaluations.

Zoom (https://gmu.zoom.us/j/95380267114?pwd=InbkKjByEnJpDAYYxfen3YkAPLc1VT.1)

Meeting Id: 953 8026 7114

Passcode: 184133

Posted 12 months ago

Upcoming Events

Categories

Challenges in AI-assisted AI evaluation Events / CS Seminar

Upcoming Events

Categories

Challenges in AI-assisted AI evaluation
Events / CS Seminar