CSC 2547, Winter 2020:

Machine Learning for Machine Vision as Inverse Graphics

Department of Computer Science

University of Toronto



Convolutional neural networks have achieved astounding breakthroughs on a number of machine vision tasks, especially object classification.  However, unlike people, they can require vast amount of data to train, and their (sometimes comical) mistakes show that they do not truly understand what they see. This limits their abilities and leaves them short of the full promise of Artificial Intelligence.

To fully understand a scene, a computer must have a rich, 3-dimensional representation of the world.  It must be able to infer what objects are in a scene, their position, orientation, size, shape, color, texture, category, what parts they are composed of, their relationship to other objects in the scene, as well as the illumination and position and viewing angle of the camera.  In other words, a scene understanding program must be able to represent the world in much the same way as a computer graphics program does. The main difference is that computer graphics generates a 2-dimensional image from a 3-dimensional representation, while scene understanding aims to do the reverse: to infer a 3-dimensional representation of a scene from a 2-dimensional image.  Note that once a 3-dimensional representation has been inferred, it should be possible to answer many common-sense questions about an image. It should also be possible to use a graphics program to regenerate the image from the 3-dimensional representation, and moreover, to generate modified versions of the image, in which objects have been moved or rotated and illumination or camera positions have changed.

This view of scene understanding is known as inverse graphics. Inverting the graphics process to generate a 3-dimensional representation of an image is a difficult, non-deterministic problem. This course approaches the problem with machine learning. That is, we investigate techniques for learning programs that do inverse graphics, as well as related techniques for overcoming the limitations of convolutional neural networks for vision.

This is an advanced graduate course in machine learning. It is primarily a seminar course in which students will read and present papers from the literature. There will also be a major course project. The goal is to bring students to the state of the art in this exciting field. Tentative topics include generative and discriminative models for vision, convolutional and deconvolutional neural nets, variational inference and autoencoders, capsule networks, group symmetries and equivariance, visual attention mechanisms, differentiable renderers, and applications.


A solid introduction to Machine Learning (such as csc411 or a graduate course in ML), especially neural nets, a solid knowledge of linear algebra, the basics of multivariate calculus and probability, and programming skills, especially programming with vectors and matrices.  Mathematical maturity will be assumed.



Teaching Assistants:

Course Structure

The course is organized along the lines of csc2547: Learning to Search, given by David Duvenaud last semester, though the course content is quite different.

Paper presentations:


Marking Scheme:

Tentative Schedule


Student Presentations:

Project Presentations: