•   When: Tuesday, June 05, 2018 from 02:00 PM to 04:00 PM
  •   Speakers: Arsalan Mousavian
  •   Location: ENGR 4201
  •   Export to iCal

In recent years, new deep learning architectures and associated training algorithms led to impressive results and new applications in computer vision, speech, and natural language processing. These advances were largely enabled by the abundance of labeled data and availability of new computational resources and paradigms. As a result, training large machine learning models with millions of parameters became feasible. This leads to exciting opportunities for design of general purpose AI robotic agents. This thesis presents several building blocks of robotic autonomous agent, where individual components are learned in a data driven fashion and evaluated in a variety of environments. Traditionally, perception, planning, and control are among the key building blocks in robotic architectures. On the perception side, we revisit classical computer vision problems of semantic parsing and 3D reconstruction of the scene. We present a model for simultaneously classifying each pixel to one of the semantic categories (e.g chair, floor, ceiling, microwave) and estimating its depth value from a single image. To support reasoning about objects and their interactions, we propose a model for estimation of orientation, translation, and physical dimensions of the objects from a single image. In both aforementioned approaches, we leverage recent advances in deep learning and propose novel models and optimization frameworks that result in significant performance gain compared to the state of the art.

At last, we introduce a novel method for semantic target-driven navigation, which ties the previously learned representations with planning and control. While in the traditional robotic systems, planning and control are separate modules that require access to the model of the environment, we present an approach to learn the mapping from semantic representation of images to actions directly, in an end-to-end fashion. Since the abstract semantic representations do not change between real and simulated data, the proposed approach for navigation also introduces a new paradigm for training complex models using mixture of simulations and real-world experiments.

Posted 5 years, 11 months ago