You Are What You Train On: Discovering Challenges for Natural Language Interfaces | George Mason Department of Computer Science

When: Wednesday, February 17, 2021 from 11:00 AM to 12:00 PM
Speakers: Jonathan K. Kummerfeld, Postdoctoral Research Fellow in Computer Science and Engineering at the University of Michigan
Location: ZOOM
Export to iCal

Abstract
Natural Language Interfaces like Siri and Alexa help people do things more efficiently, but they are brittle, unable to handle the full range of ways people naturally express themselves. Each of their actions is manually defined by developers, with limited ability to compose actions to make more sophisticated ones. The choice of action is made by a statistical model that is limited by the range of data seen in training. Despite steady progress in the accuracy of these systems, the true scope of remaining challenges has been obscured by the way researchers collect and prepare data.

In this talk, I will describe two key challenges I discovered, revealing previously unknown limitations of natural language interfaces. First, I will show that systems for converting questions to SQL queries have limited generalizability beyond examples seen in training (ACL 2018). I propose a new model and a new way to split data into training and test sets that together make this limitation clear. Second, I will show that standard crowd-worker data collection processes miss the long and heavy tail of ways people speak (ACL 2017). I propose an outlier-based data collection workflow (NAACL 2019), and a complementary taboo list workflow (EMNLP 2020), that improve data diversity and reduce the cost of data cleaning. I will conclude by outlining a research agenda for fundamentally changing the capabilities of these systems. Today we use these systems to do simple tasks we could do ourselves, e.g. “start a 5 minute timer”. My work will lead to systems that do complex tasks we can not efficiently do on our own, e.g. “tell me how many of these job applicants mention more than 10 papers in their research statement”.

Bio
Jonathan K. Kummerfeld is a Postdoctoral Research Fellow in Computer Science and Engineering at the University of Michigan. He completed his Ph.D. at the University of California, Berkeley, advised by Prof. Dan Klein. Jonathan’s research has revealed new challenges in syntactic parsing, coreference resolution, and dialogue. He has proposed models and algorithms to address these challenges, improving the speed and accuracy of natural language processing systems. He has been on the program committee for 55 conferences and workshops, including Area Chair at ACL and Shared Task Coordinator for the DSTC workshops. He currently serves as a standing reviewer for the Computational Linguistics journal and the Transactions of the Association for Computational Linguistics journal. For more details, see his website: https://www.jkk.name

Posted 4 years, 7 months ago

You Are What You Train On: Discovering Challenges for Natural Language Interfaces Events / CS Seminar

Categories

You Are What You Train On: Discovering Challenges for Natural Language Interfaces
Events / CS Seminar