This class is a graduate course in visual perception for autonomous driving. The class will briefly cover topics in localization, ego-motion estimaton, free-space estimation, visual recognition (classification, detection, segmentation), etc
Prerequisites: A good knowledge of statistics, linear algebra, calculus is necessary as well as good programming skills. A good knowledge of computer vision and machine learning is strongly recommended.
Each student will need to write two paper reviews each week, present once or twice in class (depending on enrollment), participate in class discussions, and complete a project (done individually or in pairs).
The final grade will consist of the following | |
---|---|
Participation (attendance, participation in discussions, reviews) | 20% |
Presentation (presentation of papers in class) | 20% |
Project (proposal, final report, presentation) | 60% |
Every week (except for the first two) we will read 2 to 3 papers. The success of the discussion in class will thus be due to how prepared the students come to class. Each student is expected to read all the papers that will be discussed and write two detailed reviews about the selected two papers. Depending on enrollment, each student will need to also present a paper in class. When you present, you do not need to hand in the review.
Deadline: The reviews will be due one day before the class.
Structure of the review |
---|
Short summary of the paper |
Main contributions |
Positive and negatives points |
How strong is the evaluation? |
Possible directions for future work |
Depending on enrollment, each student will need to present a few papers in class. The presentation should be clear and practiced and the student should read the assigned paper and related work in enough detail to be able to lead a discussion and answer questions. Extra credit will be given to students who also prepare a simple experimental demo highlighting how the method works in practice.
A presentation should be roughly 45 minutes long (please time it beforehand so that you do not go overtime). Typically this is about 30 slides. You are allowed to take some material from presentations on the web as long as you cite the source fairly. In the presentation, also provide the citation to the papers you present and to any other related work you reference.
Deadline: The presentation should be handed in one day before the class (or before if you want feedback).
Structure of presentation: |
---|
High-level overview with contributions |
Main motivation |
Clear statement of the problem |
Overview of the technical approach |
Strengths/weaknesses of the approach |
Overview of the experimental evaluation |
Strengths/weaknesses of evaluation |
Discussion: future direction, links to other work |
Each student will need to write a short project proposal in the beginning of the class (in January). The projects will be research oriented. In the middle of semester course you will need to hand in a progress report. One week prior to the end of the class the final project report will need to be handed in and presented in the last lecture of the class (April). This will be a short, roughly 15-20 min, presentation.
The students can work on projects individually or in pairs. The project can be an interesting topic that the student comes up with himself/herself or with the help of the instructor. The grade will depend on the ideas, how well you present them in the report, how well you position your work in the related literature, how thorough are your experiments and how thoughtful are your conclusions.
Coming soon
Date | Topic | Readings | Presenters | Slides |
---|---|---|---|---|
Jan 12, 19 | Introduction | Raquel Urtasun | intro | |
Feb 2 | Stereo |
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches, CVPR 2015 [PDF] [code] J. Zbontar and Y. LeCun Stereo Processing by Semi-Global Matching and Mutual Information, PAMI 2008 [PDF] H. Hirschmueller Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation, ECCV 2014 [PDF] [code] K. Yamaguchi, D. McAllester and R. Urtasun | Wenjie Luo | stereo |
Feb 2, 9 | Optical Flow |
Non-Local Total Generalized Variation for Optical Flow Estimation, ECCV 2014 [PDF] R. Ranftl, K. Bredies and T. Pock Large displacement optical flow: Descriptor matching in variational motion estimation, PAMI 2011 [PDF] T. Brox and J. Malik FlowNet: Learning Optical Flow with Convolutional Networks, ICCV 2015 [PDF] A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Hazırbaş, V. Golkov, P. Smagt, D. Cremers,T. Brox A Quantitative Analysis of Current Practices in Optical Flow Estimation and The Principles Behind Them, IJCV 2011 [PDF] D. Sun, S. Roth and M. Black | Shenlong Wang | motion |
Feb 9, 16 | Optical Flow / Scene Flow |
EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow, CVPR 2015 [PDF] [code] J. Revaud, P. Weinzaepfel, Z. Harchaoui and C. Schmid DeepFlow: Large displacement optical flow with deep matching, ICCV 2013 [PDF] P. Weinzaepfel, J. Revaud, Z. Harchaoui and C. Schmid Robust Monocular Epipolar Flow Estimation, CVPR 2013 [PDF] K. Yamaguchi, D. McAllester and R. Urtasun 3D Scene Flow Estimation with a Piecewise Rigid Scene Model, CVPR 2015 [PDF] C. Vogel, K. Schindler and S. Roth | Min Bai | scene_flow |
Feb 16 | Visual Odometry |
Visual-lidar Odometry and Mapping: Low- rift, Robust, and Fast, ICRA 2015 [PDF] J. Zhang and S. Singh StereoScan: Dense 3d Reconstruction in Real-time, IV 2011 [PDF] A. Geiger, J. Ziegler and C. Stiller Real-time stereo visual odometry for autonomous ground vehicles, IROS 2008 [PDF] A. Howard | Patric McGarey | visual odometry |
Feb 23 | SLAM |
DTAM: Dense tracking and mapping in real-time, ICCV 2011 [PDF] Newcombe, R., Lovegrove, S., Davison, A Large-scale direct slam with stereo cameras, IROS 2015 [PDF] J. Engel, J. Stuckler, and D. Cremers LSD-SLAM: Large-scale direct monocular SLAM, ECCV 2014 [PDF] J. Engel, T. Schöps, and D. Cremers Relative continuous-time SLAM, International Journal of Robotics Research 2015 [PDF] S. Anderson, K. MacTavish, and T. Barfoot Full STEAM ahead: Exactly sparse gaussian process regression for batch continuous-time trajectory estimation on SE (3), IROS 2015 [PDF] Newcombe, R., Lovegrove, S., Davison, A Long-term 3D map maintenance in dynamic environments, ICRA 2014 [PDF] F. Pomerleau, P. Krusi, F. Colas, P. Furgale and R. Siegwart | Kirk MacTavish, Lingzhu Xiang | SLAM |
March 1 | Free-Space Estimation |
Chapter 9 of Probabilistic Robotics Book [PDF] S. Thrun, W. Burgard, D. Fox Free Space Computation Using Stochastic Occupancy Grids and Dynamic Programming In Workshop Dynamical Vision ICCV 2007 [PDF] H. Badino, U. Franke and R. Mester The Stixel World - A Compact Medium Level Representation of the 3D-World DAGM 2009 [PDF] H. Badino, U. Franke and D. Pfeiffer | Hao Wu | Free-Space |
March 8 | 2D Object Detection |
Rich feature hierarchies for accurate object detection and semantic segmentation CVPR 2014 [PDF] R. Girshick, J. Donahue, T. Darrell, J. Malik Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks NIPS 2015 [PDF] S. Ren, K. He, R. Girshick, J. Sun Deep Residual Learning for Image Recognition ArXiv Dec 2015 [PDF] K. He, X. Zhang, S. Ren, J. Sun | Renjie Liao | 2D Detection |
March 8 | 3D Object Detection |
Data-Driven 3D Voxel Patterns for Object Category Recognition CVPR 2015 [PDF] Y. Xiang, W. Choi, Y. Lin and S. Savarese 3D Object Proposals for Accurate Object Class Detection NIPS 2015 [PDF] X. Chen, K. Kundu, Y. Zhu, A. Berneshawi, H. Ma, S. Fidler and R. Urtasun | Zhen Li | 3D Detection |
March 15 | Semantic Segmentation |
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs ICLR 2015 [PDF]; [Code];nbsp L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation ArXiv 2015 [PDF] [Project] V. Badrinarayanan, A. Kendall, R. Cipolla Joint Semantic Segmentation and 3D Reconstruction from Monocular Video ECCV 2014 [PDF] [Project] A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg | Stefania Raimondo | Semantic Segmentation |
March 15 | Instance-level Segmentation |
Instance Segmentation of Indoor Scenes using a Coverage Loss ECCV 2014 [PDF] N. Silberman, D. Sontag, R. Fergus Instance-Level Segmentation with Deep Densely Connected MRFs Arxiv Dec 2015 [PDF] Z. Zhang, S. Fidler, and R. Urtasun | Mengye Ren | Instance-level Segmentation |
March 22 | Tracking |
Global Data Association for Multi-Object Tracking Using Network Flows CVPR 2008 [PDF] L. Zhang and R. Nevatia Multiple Object Tracking using K-Shortest Paths Optimization PAMI 2011 [PDF] J. Berclaz, F. Fleuret, E. Turetken, and P. Fua Multi-target tracking by discrete-continuous energy minimization PAMI 2016 [PDF] A. Milan, K. Schindler, S. Roth | Wenjie Luo | Tracking |
March 29 | Place Recognition |
FAB-MAP: Probabilistic localization and mapping in the space of appearance IJRR 2008 [PDF] M. Cummins and P. Newman Place recognition with ConvNet landmarks: Viewpoint-robust, condition-robust, training-free RSS 2015 [PDF] N. Sünderhauf, S. Shirazi, and A. Jacobson Convolutional networks for real-time 6-DOF camera relocalization ArXiv 2015 [PDF] A. Kendall, M. Grimes, and R. Cipolla | Valentin Peretroukhin | Place Recognition |