- When: Wednesday, February 21, 2018 from 01:30 PM to 03:30 PM
- Speakers: Andeep Singh Toor
- Location: ENGR 4801
- Export to iCal
This dissertation is about advances we have made to address the Visual Turing Test (VTT), in general, and Image Understanding using Visual Question Answering (VQA), in particular. The visual world poses challenges such as uncertainty, incompleteness, and complexity. Additionally, the multi-modal queries submitted to VQA may or may not contain relevant information, which is yet another real-world challenge. The novelty of our dissertation is to approach VQA using a collaborative and context-aware approach where the content of queries can be parsed to assess their relevance, if any, and iteratively refined for their ultimate resolution. The proposed Collaborative Context-Aware Visual Question Answering (C2VQA) methodology encompasses convolutional neural networks and deep learning, joint visual-text embedding, recurrence and sequencing, and memory models to interpret the queries and best answer them.
The feasibility and utility for C2VQA is shown across a number of diverse applications that include single images, sets of images, and videos. These applications include data fusion of biometrics and forensics, content-based image retrieval, novel security protocols for access and authentication, biometric surveillance, query relevance and editing, ranking, and triage.
Posted 6 years, 9 months ago