Everyone has large photo collections these days. How can you intelligently find all pictures in which your dog appears? How can you find all pictures in which you are frowning? Can we make cars smart, e.g., can the car drive you to school while you finish your last homework? How can a home robot understand the environment, e.g., switch on a tv when being told so and serve you dinner? If you take a few pictures of your living room, can you reconstruct it in 3D (which allows you to render it from any new viewpoint and thus allows you to create a "virtual tour" of your room)? Can you reconstruct it from one image alone? How can you efficiently browse your home movie collection, e.g. find all shots in which Tom Cruise is chasing a bad guy?

This class is an introduction to fundamental concepts in image understanding, the subdiscipline of artificial intelligence that tries to make the computers "see". It will survey a variety of interesting vision problems and techniques. Specifically, the course will cover image formation, features, object and scene recognition and learning, multi-view geometry and video processing. Since Kinect is popular these days, we will also try to squeeze recognition with RGB-D data into the schedule. The goal of the class will be to grasp a number of computer vision problems and understand basic approaches to tackle them for real-world applications.

Prerequisites: A second year course in data structures (e.g., CSC263H), first year calculus (e.g., MAT135Y), and linear algebra (e.g., MAT223H) are required. Students who have not taken CSC320H will be expected to do some extra reading (e.g., on image gradients). Matlab will extensively used in the programming excercises, so any prior exposure to it is a plus (but not a requirement).

back to top

When emailing us, please put CSC420 in the subject line.

Information Sheet

The information sheet for the class is available here.

Programming Language(s)

You are expected to do some programming assignments for the class. You can code in either Matlab, Python or C. However, in class we will provide the examples and functions in Matlab. Note also that most Computer Vision code online is in Matlab so it's useful to learn it. Knowing C is only a plus since you can interface your C code to Matlab via "mex".

Please make sure you have access to MATLAB with the Image Processing Toolbox installed.


This class uses piazza. On this webpage, we will post announcements and assignments. The students will also be able to post questions and discussions in a forum style manner, either to their instructors or to their peers.

Please sign up here in the beginning of class.

back to top

We will have several speakers presenting in the course, including robotics, vision and ML professors, as well as last year's CSC420 students that are now doing Computer Vision in grad school.

PhD Students:

back to top

We will not directly follow any textbook, however, we will require some reading in the textbook below. Additional readings and material will be posted in the schedule table as well as the resources section.

back to top

Each student is expected to complete five assignments which will be in the form of problem sets and programming problems, and complete a project.


Assignments will be given every two weeks. They will consist of problem sets and programming problems with the goal of deepening your understanding of the material covered in class. All solutions and programming should be done individually. There will be five assignments altogether, each worth 12% of the final grade.

Submission: Solutions to the assignments should be submitted through CDF. The preferred format is PDF, but we will also accept Word. Unless stated otherwise in the Assignments' instructions include the code (for exercises that ask for code) within the solution document. An ideal example of how the code can be included can be found here. We also don't mind if you print-screen your matlab functions and include the pictures as long as they are of good quality to be read. If you are using Matlab's built-in functions within your code you should not include them. But include all your code.

Deadline: The solutions to the assignments should be submitted by 11.59pm on the date they are due. Anything from 1 minute late to 24 hours will count as one late day.

Lateness policy: Each student will be given a total of 3 free late days. This means that you can hand in three of your assignments one day late, or one assignment three days late. It is up to you to make a good planning of your work. After you have used your 3 day budget, your late assignments will not be accepted.

Plagiarism: We take plagiarism very seriously. Everything you hand in to be marked, namely assignments and projects, must represent your own work. Read How not to plagiarize.


Each student will be given a topic for the project. You will be able to choose from a list of projects, or propose your own project which will need to be discussed and approved by your instructor. You will need to hand in a report which will count 25% of your grade. Each student will also need to present and be capable to defend his/her work. The presentation will count 15% of the grade.

The final grade will be computed as follows:

(5 assignments, each worth 12%)
(report: 25%, presentation: 15%)

back to top

The course will cover image formation, feature representation and detection, object and scene recognition and learning, multi-view geometry and video processing. Since Kinect is popular these days, we will also try to squeeze recognition with RGB-D data into the schedule.

Image Processing
Linear filters
Edge detection
Features and matching
Keypoint detection
Local descriptors
Low-level and Mid-level grouping
Region proposals
Hough voting
Face detection and recognition
Object recognition
Object detection
Part-based models
Image labeling
Image formation
Multi-view reconstruction
Video processing
Action recognition
close Tentative Schedule

back to top

DateTopicReading SlidesAdditional materialAssignments
Sept 15Course Introduction lecture1.pdf
Image Processing
Sept 17Linear FiltersSzeliski book, Ch 3.2lecture2.pdfcode: finding Waldo, smoothing, convolution
Sept 22Edge DetectionSzeliski book, Ch 4.2lecture3.pdfcode: edges with Gaussian derivativesAssignment 1: due Oct 3, 11.59pm, 2015
Sept 24Edge Detection cont.Szeliski book, Ch 4.2lecture4.pdf
Sept 29Image PyramidsSzeliski book, Ch 3.5lecture5.pdf
Features and Matching
Oct 1Keypoint Detection: Harris Corner DetectorSzeliski book, Ch 4.1.1
pages:   209-215
Oct 6Keypoint Detection: Scale Invariant KeypointsSzeliski book, Ch 4.1.1
pages:   216-222
Oct 8Keypoint Detection: Scale Invariant Keypointslecture7 continuedAssignment 2: due Oct 18, 11.59pm, 2015
Oct 13Local Descriptors: SIFT,
Szeliski book, Ch 4.1.2
Lowe's SIFT paper
lecture8.pdfcode: compiled SIFT code, VLFeat's SIFT code
Oct 15Robust Matching, HomographiesSzeliski book, Ch 6.1lecture9.pdfcode: Soccer and screen homography
Oct 20Camera ModelsSzeliski, 2.1.5, pp. 46-54
Zisserman & Hartley, 153-158
(hi-res version)
Oct 22Camera Modelslecture10 continued Assignment 3: due Nov 6, 11.59pm, 2015
Oct 27Homography revisitedlecture11.pdf
(hi-res version)
Projects: due Dec 4, 11.59pm, 2015
Oct 29Stereo: Parallel Optics lecture12.pdf
(hi-res version)
code: Yamaguchi et al.
Nov 3Stereo: General CaseSzeliski book, Ch. 11.1
Zisserman & Hartley, 239-261
(hi-res version)
Nov 5Recognition: OverviewGrauman & Leibe, Visual Object Recognitionlecture14.pdf
(hi-res version)
Assignment 4: due Nov 17, 11.59pm, 2015
Nov 12All You Wanted To Know About Neural NetworksInvited lecture: Alex Schwinglecture15.pdfSara Sabour's tutorial on NNs (for Python)
Nov 17Fast RetrievalSivic & Zisserman, Video Googlelecture16.pdf
(hi-res version)
Nov 19Implicit Shape ModelB. Leibe et al., Robust Object Detection with Interleaved Categorization and Segmentationlecture17.pdf
(hi-res version)
Yukun Zhu's tutorial on caffe and classification with NNsAssignment 5: due Nov 28, 11.59pm, 2015
Nov 24The HOG DetectorHOG paperlecture18.pdf
(hi-res version)
Jialiang Wang's Tutorial on classification (HOG+SVM)
Nov 26Deformable Part-based ModelDPM paperlecture19.pdf
(hi-res version)
Dec 1SegmentationSLIC,
Felzenswalb & Huttenlocher
(hi-res version)

back to top

Whether you are enrolled in the class or just casually browsing the webpage, please leave feedback about the class / material. You can do it here. Thanks!

If you want to hear about a particular topic that is not planned in the regular course, please post it here.

back to top

Assignment 1: Finding City Skylines and Seam Carving

The exercise was to remove horizontal and/or vertical seams, i.e. paths with the smallest sum of gradients. We followed Avidan and Shamir's "Seam Carving for Content-Aware Image Resizing" paper.

The second exercise was to trace a skyline of a city. Most solutions used the seam carving approach to do this, while one of the solutions try to segment out the sky.

Sophia Huynh
John Armstrong
Tamara Lipowski
Yawen Ma
Arthur Brzuzy
Bao Xin Chen
Seyed Kamyar Seyed Ghasemipour
Ge Ya Xu

Assignment 2: Window Detection

The exercise was to detect all frontal windows in a given image. Even more extra credit was given to solutions that also detected floors of each building. We got some really great solutions!

Noel Prasad D'Souza
Bao Xin Chen
Tamara Lipowski
Ning Ning Lin
Te Chen
Sophia Huynh

back to top

back to top