Visual Recognition, Winter 2012

Visual Recognition

Winter 2012


Developing autonomous systems that are able to assist us in everydays tasks is one of the grand challenges in modern computer science. While a variety of novel sensors have been developed in the past few years, in this class we will focus on the extraction of this knowledge from visual information alone. One of the most remarkable examples of successful recognition systems is our visual system, which is able to extract high-level information from very noisy and ambiguous data. Unfortunately, despite decades of research efforts, machines are still way below human performance. In this class we will study why this is the case. The goal of this graduate class is to understand the different visual recognition tasks as well as the techniques employed to solve them. A strong component of the course will be statistical learning as it plays a key role in almost every modern visual recognition system. We will cover all stages of the recognition pipeline: low-level (e.g., features), mid-level (e.g., segmentation) as well as high-level reasoning (e.g., scene understanding). Knowledge of machine learning and computer vision is not required, but highly recommended. The theoretical aspects of visual recognition will be covered during the lectures. The class will have a strong practical component, as the students will build the different recognition components during the homework sessions.


Summary of the class

General information

Lecture: Tuesday and Thursday 10:30 - 11:50
Room: TTIC 530 (6045 S. Kenwood, 5th floor)

Instructor: Raquel Urtasun
E -mail:

Grading: exam (35%) + project (65%)


  1. Classification: features, bag of words (BOW), similarity between images, learning features as well as hashing schemes and retrieval.
  2. Detection: sliding window approaches, branch and bound, structure prediction, hough voting and NN approaches, hierarchical models.
  3. Segmentation: classical approaches as well as modern structure pre- diction approaches including message passing and graph cuts for inference, and CRFs and structured-SVMs for learning.
  4. Pose estimation: pictorial structures (2D) as well as 3D pose estimation including particle filter-based approaches.
  5. Modern 3D geometry and 3D scene understanding: stereo, scene layout (e.g., 3D box for indoor scenes, road layout for outdoor scenes).


Date Topic Slides Reading
Jan 3 Introduction intro

Chapter 1 of R. Szeliski book

Jan 5 Image Formation formation

Chapter 2 of R. Szeliski book;

Jan 10 Image Filtering filtering

Chapter 2 and 3 of R. Szeliski book

Jan 12 Midwest Vision Workshop  


Jan 17 Transformations + features transformations


Jan 19 Interest points + descriptors features


Jan 24 Instance + Category level recognition instance


Jan 26 Sliding-Window approaches sliding window


Jan 31 Deformable part-based models latent svm


Feb 2 Poselets poselet


Feb 7 More on part-based models part-based models


Feb 9 NO CLASS  


Feb 14 Combining Features combinations


Feb 16 Learning Representations I learning representations


Feb 21 Learning Representations II sparse coding + topic models


Feb 23 Graphical models: inference learning


Feb 28 Graphical models: inference + learning inference


March 1 Segmentation segmentation


March 6 Attributes + descriptions + Context attributes


March 8 Scene understanding scene