CS 747 Deep Learning Assignment 1

CS 747

Project Guidelines

For the project you need to choose a topic which explores in greater detail some topics covered in class, involes additional reading and implementation of the chosen methods and demostrate it in action.

You can form teams, ideally two or three people. One page project description is due on April 2nd. The template for the project desription can be found here project template
Implementation or demo: Find a research paper related to the topics covered in class and implement their method. Apply existing methods to new datasets. Compare and contrast several methods, adapt or modify them. If feasible, create a demo that can be shown in class.

Kaggle competition: Find a competition on Kaggle and implement a deep learning system to enter in it. Here are some options: Deep learning competitions

Paper: Write a survey or tutorial paper on the topic of your lecture (or a different topic if you insist). Here is an examples of survey paper on Variational Autoencoders If the topic you have chosen already has a good recent tutorial like the one above, this would probably not be the best choice (unless you feel you can write a significantly different tutorial that can offer independent value). The paper should be 5-6 pages in length (single-spaced, single column, 11pt font, 1 inch margins) and typeset in LaTeX.

Past projects:
State Farm Distracted Driver detection (Kaggle)
Learning Latent Representatios using Conditional Variational Autoencoders
Deep Learning framework for protein folding (NN for predicting protein strucures)
Sentimen Analyiss of the movie reviews using RNN's
Facial Emotion Recognition
Licence plate number detection (Kaggle)
Capturing Upper Torso Movement using MarkerLess Pose Estimation
Tweet Sentiment Extraction (Kaggle)
Show and Tell Image Captioning System
Detection Pneumonia on X-rays images
End-to-End Recovery of Human Shape and Pose
Implementation of Real-Time Seamless Single Shot 6D Object Pose Prediction
Fine-grained categorization
Style transfer on Soccer Images
Tracking Customer Flow using YOLO
Underwater Trash Detection Using Faster-RCNN

Commonly used models for computer vision problems

Image Classification: [Krizhevsky et al.], [Russakovsky et al.], [Szegedy et al.], [Simonyan et al.], [He et al.], [Huang et al.], [Hu et al.] [Zoph et al.]

Object detection: [Girshick et al.], [Ren et al.], [He et al.]

Image segmentation: [Long et al.] [Noh et al.] [Chen et al.]

Video classification: [Karpathy et al.], [Simonyan and Zisserman] [Tran et al.] [Carreira et al.] [Wang et al.]

Scene classification: [Zhou et al.]

Face recognition: [Taigman et al.] [Schroff et al.] [Parkhi et al.]

Depth estimation: [Eigen et al.]

Image-to-sentence generation: [Karpathy and Fei-Fei], [Donahue et al.], [Vinyals et al.] [Xu et al.] [Johnson et al.]

Visualization and optimization: [Szegedy et al.], [Nguyen et al.], [Zeiler and Fergus], [Goodfellow et al.], [Schaul et al.]

You might also gain inspiration by taking a look at some popular computer vision datasets:

Coomputer vision, Computer Vision and Language Datasets

Meta Pointer: A large collection organized by CV Datasets.

Yet another Meta pointer

ImageNet: a large-scale image dataset for visual recognition organized by WordNet hierarchy

SUN Database: a benchmark for scene recognition and object detection with annotated scene categories and segmented objects

Places Database: a scene-centric database with 205 scene categories and 2.5 millions of labelled images

NYU Depth Dataset v2: a RGB-D dataset of segmented indoor scenes

Microsoft COCO: a new benchmark for image recognition, segmentation and captioning

Flickr100M: 100 million creative commons Flickr images

Labeled Faces in the Wild: a dataset of 13,000 labeled face photographs

Human Pose Dataset: a benchmark for articulated human pose estimation

YouTube Faces DB: a face video dataset for unconstrained face recognition in videos

UCF101: an action recognition data set of realistic action videos with 101 action categories

HMDB-51: a large human motion dataset of 51 action classes

ActivityNet: A large-scale video dataset for human activity understanding

Moments in Time: A dataset of one million 3-second videos

Txt, NLP datasets

Txt classification, generation, question answering