Machine Learning in Computer Vision

In recent years, Deep Learning has become a dominant Machine Learning tool for a wide variety of domains. One of its biggest successes has been in Computer Vision where the performance in problems such object and action recognition has been improved dramatically. In this course, we will be reading up on various Computer Vision problems, the state-of-the-art techniques involving different neural architectures and brainstorming about promising new directions.

Please sign up here in the beginning of class.

This class is a graduate seminar course in computer vision. The class will cover a diverse set of topics in Computer Vision and various machine learning approaches. It will be an interactive course where we will discuss interesting topics on demand and latest research buzz. The goal of the class is to learn about different domains of vision, understand, identify and analyze the main challenges, what works and what doesn't, as well as to identify interesting new directions for future research.

Prerequisites: Courses in computer vision and/or machine learning (e.g., CSC320, CSC420, CSC411) are highly recommended (otherwise you will need some additional reading), and basic programming skills are required for projects.

Time and Location

Winter 2018

Day: Wed
Time: 12-2pm
Room: GB 244

Instructor

Sanja Fidler

Email: fidler@cs dot toronto dot edu
Homepage: http://www.cs.toronto.edu/~fidler
Office hours: by appointment (send email)

When emailing me, please put CSC2548 in the subject line.

Forum

This class uses piazza. On this webpage, we will post announcements and assignments. The students will also be able to post questions and discussions in a forum style manner, either to their instructors or to their peers.

Each student will need to write two paper reviews each week, present once in class (depending on enrollment), participate in class discussions, and complete a project (done individually or in pairs).

Grading

The final grade will consist of the following
`Participation` (attendance, participation in discussions, reviews)	15%
`Presentation` (presentation of papers in class)	25%
`Project` (proposal, final report)	60%

Detailed Requirements (click to Expand / Collapse)

Paper reviewing

Every week (except for the first two) we will read 3 to 4 papers. The success of the discussion in class will thus be due to how prepared the students come to class. Each student is expected to read all the papers that will be discussed and write two detailed reviews about the selected two papers. Depending on enrollment, each student will need to also present a paper in class. When you present, you do not need to hand in the review.

Deadline: The reviews will be due one day before the class.

Structure of the review
`Short summary of the paper`
`Main contributions`
`Positive and negatives points`
`How strong is the evaluation?`
`Possible directions for future work`

Presentation

Depending on enrollment, each student will need to present a few papers in class. The presentation should be clear and practiced and the student should read the assigned paper and related work in enough detail to be able to lead a discussion and answer questions. Extra credit will be given to students who also prepare a simple experimental demo highlighting how the method works in practice.

A presentation should be roughly 20 minutes long (please time it beforehand so that you do not go overtime). Typically this is about 15 to 20 slides. You are allowed to take some material from presentations on the web as long as you cite the source fairly. In the presentation, also provide the citation to the paper you present and to any other related work you reference.

Deadline: The presentation should be handed in one day before the class (or before if you want feedback).

Structure of presentation:
`High-level overview with contributions`
`Main motivation`
`Clear statement of the problem`
`Overview of the technical approach`
`Strengths/weaknesses of the approach`
`Overview of the experimental evaluation`
`Strengths/weaknesses of evaluation`
`Discussion: future direction, links to other work`

Project

Each student will need to write a short project proposal in the beginning of the class (in January). The projects will be research oriented. In the middle of semester course you will need to hand in a progress report. One week prior to the end of the class the final project report will need to be handed in and presented in the last lecture of the class (April). This will be a short, roughly 15-20 min, presentation.

The students can work on projects individually or in pairs. The project can be an interesting topic that the student comes up with himself/herself or with the help of the instructor. The grade will depend on the ideas, how well you present them in the report, how well you position your work in the related literature, how thorough are your experiments and how thoughtful are your conclusions.

close Detailed Requirements

The first class will present a short overview of various machine learning techniques, however, the details will be covered when reading on particular topics. Readings will touch on a diverse set of topics in Computer Vision. The course will be interactive -- we will add interesting topics on demand and latest research buzz.

Tentative Syllabus (click to Expand / Collapse)

Machine Learning
`convolutional neural networks`
`recurrent neural networks`
`neural networks on graphs`
`generative models (GAN, variational autoencoders)`
`reinforcement learning`
`graphical models`
Computer Vision
`object detection`
`semantic, instance segmentation`
`action recognition`
`stereo / flow`
`tracking`
`captioning, VQA, retrieval`
`3D scene understanding`
`image/video generation, style transfer`

close Tentative Schedule

Schedule

Date	Topic	Reading / Material	Speaker	Slides
Jan 10	Admin & Introduction		Sanja Fidler	admin
Convolutional Neural Networks
Jan 10	Convolutional Neural Nets (tutorial)	Resources: Stanford's cs231 class, VGG's Practical CNN Tutorial Code: CNN Tutorial for TensorFlow, Tutorial for caffe, CNN Tutorial for Theano	Amlan Kar, Chaoqi Wang	[pdf]
Jan 17	CNNs, Detection	Dynamic Routing Between Capsules [PDF] Sara Sabour, Nicholas Frosst, Geoffrey E Hinton	Sara Sabour, Nicholas Frosst (invited)
		Overview of Object Detection	Bin Yang (invited)
Jan 24	CNNs	Deformable Convolutional Networks [PDF] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei	Robin Swanson	[pdf]
	Detection	YOLO9000: Better, Faster, Stronger [PDF] Joseph Redmon, Ali Farhadi	Haris Khan	[pdf]
	Segmentation			[pdf]
Jan 31	CNNs, Segmentation	Multi-Scale Context Aggregation by Dilated Convolutions [PDF] Fisher Yu, Vladlen Koltun	Najmus Ibrahim	[pdf]
	Instance Segmentation	Deep Watershed Transform for Instance Segmentation [PDF] Min Bai, Raquel Urtasun	Min Bai (invited)
Feb 7	Instance Segmentation	Mask R-CNN [PDF] Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick	Aditya Sanghi	[pdf]
	Memory efficient DL	The Reversible Residual Network: Backpropagation Without Storing Activations [PDF] Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse In-Place Activated BatchNorm for Memory-Optimized Training of DNNs [PDF] Samuel Rota Bulo, Lorenzo Porzi, Peter Kontschieder	Harris Chan	[pdf]
	Stereo	Efficient Deep Learning for Stereo Matching [PDF] Wenjie Luo, Alexander G. Schwing, Raquel Urtasun End-to-End Learning of Geometry and Context for Deep Stereo Regression [PDF] Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter Henry, Ryan Kennedy, Abraham Bachrach, Adam Bry	Dominic Cheng	[pdf]

Tutorials, related courses:

Introduction to Neural Networks, CSC321 course at University of Toronto
Course on Convolutional Neural Networks, CS231n course at Stanford University
Course on Probabilistic Graphical Models, CSC412 course at University of Toronto, advanced machine learning course

Software:

Caffe: Deep learning for image classification
Tensorflow: Open Source Software Library for Machine Intelligence (good software for deep learning)
Theano: Deep learning library
mxnet: Deep Learning library
Torch: Scientific computing framework with wide support for machine learning algorithms
LIBSVM: A Library for Support Vector Machines (Matlab, Python)
scikit: Machine learning in Python

Popular datasets:

ImageNet: Large-scale object dataset
Microsoft Coco: Large-scale image recognition, segmentation, and captioning dataset
Cityscapes: Autonomous driving dataset
PASCAL VOC: Object recognition dataset
KITTI: Autonomous driving dataset
NYUv2: Indoor RGB-D dataset
LSUN: Large-scale Scene Understanding challenge
VQA: Visual question answering dataset
Madlibs: Visual Madlibs (question answering)
Flickr30K: Image captioning dataset
Flickr30K Entities: Flick30K with phrase-to-region correspondences
MovieDescription: a dataset for automatic description of movie clips
Action datasets: a list of action recognition datasets
MPI Sintel Dataset: optical flow dataset
BookCorpus: a corpus of 11,000 books
Mnist: handwritten digits

Online demos:

Lots of cool Toronto Deep Learning Demos: image classification and captioning demos
Lots of cool demos for ConvNets by Andrej Karpathy
Reinforcement Learning with Neural Nets (read paper for more info)
Places: scene classification with neural nets
CRF as RNN: Semantic Image Segmentation
drawNet: visualization of ConvNet activations
Visualization of ConvNets for digit classification
AI-painter: modify your photo in a certain style (eg, Van Gogh); uses neural nets as explained in this paper

Main conferences:

NIPS (Neural Information Processing Systems)
ICML (International Conference on Machine Learning)
ICLR (International Conference on Learning Representations)
AISTATS (International Conference on Artificial Intelligence and Statistics)
CVPR (IEEE Conference on Computer Vision and Pattern Recognition)
ICCV (International Conference on Computer Vision)
ECCV (European Conference on Computer Vision)
ACL (Association for Computational Linguistics)
EMNLP (Conference on Empirical Methods in Natural Language Processing)

Machine Learning in Computer Vision

Winter 2018

Course overview

Course Information

Time and Location