Everyone has large photo collections these days. How can you intelligently find all pictures in which your dog appears? How can you find all pictures in which you are frowning? Can we make cars smart, e.g., can the car drive you to school while you finish your last homework? How can a home robot understand the environment, e.g., switch on a tv when being told so and serve you dinner? If you take a few pictures of your living room, can you reconstruct it in 3D (which allows you to render it from any new viewpoint and thus allows you to create a "virtual tour" of your room)? Can you reconstruct it from one image alone? How can you efficiently browse your home movie collection, e.g. find all shots in which Tom Cruise is chasing a bad guy?
Prerequisites: A second year course in data structures (e.g., CSC263H), first year calculus (e.g., MAT135Y), and linear algebra (e.g., MAT223H) are required. Students who have not taken CSC320H will be expected to do some extra reading (e.g., on image gradients). Matlab will extensively used in the programming excercises, so any prior exposure to it is a plus (but not a requirement).
The information sheet for the class is available here.
You are expected to do some programming assignments for the class. You can code in either Python, Matlab, or C. The tutorials will be given in Python.
We will use Quercus for announcements, posting of assignments, QA and discussions.
We will not directly follow any textbook, however, we will require some reading in the textbook below. Additional readings and material will be posted in the schedule table as well as the resources section.
The textbook is freely available online and provides a great resource for introduction to computer vision.
We will be reading the Sept 3, 2010 version.
Each student is expected to complete four assignments which will be in the form of problem sets and programming problems, and complete a project.
Assignments will be given every two weeks. They will consist of problem sets and programming problems with the goal of deepening your understanding of the material covered in class. All solutions and programming should be done individually. There will be four assignments altogether, each worth 15% of the final grade.
Deadline: The solutions to the assignments should be submitted by 11.59pm on the date they are due. Anything from 1 minute late to 24 hours will count as one late day.
Lateness policy: Each student will be given a total of 3 free late days. This means that you can hand in three of your assignments one day late, or one assignment three days late. It is up to you to make a good planning of your work. After you have used your 3 day budget, your late assignments will not be accepted.
Plagiarism: We take plagiarism very seriously. Everything you hand in to be marked, namely assignments and projects, must represent your own work. Read How not to plagiarize.
Each student will be given a topic for the project. You will be able to choose from a list of projects, or propose your own project which will need to be discussed and approved by your instructor. You will need to hand in a report and give a presentation. During the presentation the instructor will ask questions about class material as well as individual assignments. The grade will heavily depend on how well the material is defended and how well the class material is understood by the student.
Assignments | 60%(4 assignments, each worth 15%) |
Project and Oral Exam | 40% (report, presentation: 30%, oral exam: 10%) |
The course will cover image formation, feature representation and detection, object and scene recognition and learning, multi-view geometry and video processing. Since Kinect is popular these days, we will also try to squeeze recognition with RGB-D data into the schedule.
Image Processing |
---|
Linear filters |
Edge detection |
Features and matching |
Keypoint detection |
Local descriptors |
Matching |
Low-level and Mid-level grouping |
Segmentation |
Region proposals |
Hough voting |
Recognition |
Face detection and recognition |
Object recognition |
Object detection |
Part-based models |
Image labeling |
Geometry |
Image formation |
Stereo |
Multi-view reconstruction |
Kinect |
Video processing |
Motion |
Action recognition |
Date | Topic | Reading | Slides | Additional material | Assignments |
---|---|---|---|---|---|
Jan 10 | Course Introduction | lecture1.pdf tutorial1.zip | |||
Image Processing | |||||
Jan 10 | Linear Filters | Szeliski book, Ch 3.2 | lecture2.pdf | code: finding Waldo, smoothing, convolution | |
Jan 17 | Edge Detection | Szeliski book, Ch 4.2 | lecture3.pdf | code: edges with Gaussian derivatives | Assignment 1: due Jan 31, 11.59pm, 2022. Submit on MarkUs |
Jan 23 | Edge Detection | Szeliski book, Ch 4.2 | lecture4.pdf | ||
Jan 23 | Image Pyramids | Szeliski book, Ch 3.5 | lecture5.pdf | ||
Features and Matching | |||||
Jan 31 | Neural Networks | lecturer: Jun Gao | lecture6.pdf | ||
Features and Matching | |||||
Feb 07 | Keypoint Detection: Harris Corner Detector | Szeliski book, Ch 4.1.1 pages:   209-215 | lecture7.pdf | Assignment 2: due Feb 26, 11.59pm, 2022 | |
Feb 14 | Keypoint Detection: Scale Invariant Keypoints | Szeliski book, Ch 4.1.1 pages:   216-222 | lecture8.pdf | ||
Feb 28 | Local Descriptors: SIFT, Matching | Szeliski book, Ch 4.1.2 Lowe's SIFT paper | lecture9.pdf | Assignment 3: due March 14, 11.59pm, 2022 | |
Mar 7 | Robust Matching, Homographies | Szeliski book, Ch 6.1 | lecture10.pdf | ||
Geometry | |||||
Mar 7 | Camera Models | Szeliski, 2.1.5, pp. 46-54Zisserman & Hartley, 153-158 | lecture11.pdf | ||
Mar 14 | Camera Models contd. | ||||
Mar 21 | Stereo: Parallel Optics | lecture12.pdf tutorial_depthmap.zip |
Projects: due April 17, 11.59pm, 2022 Assignment 4: due March 31, 11.59pm, 2022 |
||
Mar 28 | Stereo: General Case | Szeliski book, Ch. 11.1 Zisserman & Hartley, 239-261 | lecture13.pdf epipolar_geometry.zip | ||
Mar 28 | Fast Retrieval | Sivic & Zisserman, Video Google | lecture14.pdf |