- When: Tuesday, November 28, 2023 from 10:00 AM to 12:00 PM
- Speakers: Yimeng Li
- Location: ENGR 3507
- Export to iCal
In the past decade, computer vision has made remarkable strides in extracting semantic and geometric information from images. These advancements were driven by deep learning techniques and large datasets and led to the development of effective object detection and semantic parsing approaches. Recent focus of Embodied Artificial Intelligence explores different avenues for end-to-end learning of effective representations for real-world mobile agents (e.g., a robot) in the context of various tasks. In this thesis, we demonstrate how to integrate existing semantic and geometric representations with the agent's decision-making capabilities. This significantly improves the sample efficiency of end-to-end navigation models and enables the reusability of existing representations for multiple tasks. This thesis makes three primary contributions. First, we study the problem of visual servoing in the reinforcement learning framework and introduce a trainable end-to-end visual servoing model for the target object and image-goal navigation tasks. Second, we present a novel approach to detect unknown out-of-distribution objects not covered in the training data by leveraging pixel-level predictions obtained by semantic segmentation models. Finally, we consider the problem of time-limited robotic exploration in previously unseen environments where exploration is limited by a predefined amount of time. We propose a novel exploration approach using learning-augmented model-based planning where we exploit semantic mapping to estimate frontier properties.
Posted 1 year, 2 months ago