Course Projects (Worth: 35%)

Project due before midnight Monday April 15 as .pdf sent to hinton@cs.toronto.edu

Please email a yourname.pdf report to csc2535ta1@cs.toronto.edu

General Guidelines

The idea of the project is to give you some experience in trying to do a small piece of original research in machine learning, or in studying a specific learning algorithm in depth. What we expect to see is an empirical investigation of a variation of a known learning algorithm, or an investigation of a known learning algorithm on a new dataset. You need to describe the algorithm clearly, relate it to existing work, implement it, and test it on a small scale problem. To do this you will need to write code, run it on some data, make some figures, read a few background papers, collect some references, and write a few pages describing your task, the algorithm(s) you used and the results you obtained. You are not expected to spend the time that would be required for a conference paper! The whole project should take about a week of full-time work to do and about two days to write-up. Projects can be done individually, or in pairs. Of course, the expectations will be higher for pair projects.

Specific Requirements

Your submission must include at least two figures which graphically illustrate quantitative aspects of your results, such as training/testing error curves, learned parameters, algorithm outputs, input data sorted by results in some way, etc. Your submission must include at least 3 references to previously published papers or book sections. Your submission should follow the generally accepted style of paper writing: include an introduction section to motivate your problem and algorithm, a section describing your approach and how it compares to previous work, a section outlining the experiments you ran and the results you obtained, and a short conclusions section to sum up what you discovered. We are expecting the report for a single person project to be 5 to 10 pages. You can write it up in whatever format you prefer, but the submitted version must be sent as a .pdf.
**Note: If you choose to do a project that is not one of the suggested projects below, you should make an appointment with Geoffrey Hinton soon to discuss it.

Marking Scheme

The projects will be marked out of 35, with each point being worth 1% of your final grade. The following criteria will be taken into account when marking:
1. Clarity/Relevance of problem statement and description of approach.
2. Discussion of relationship to previous work and references.
3. Design and execution of experiments.
4. Figures/Tables/Writing: easily readable, properly labeled, informative.

Friendly Advice

  • Be selective! Don't choose a project that has nothing to do with machine learning. Don't investigate an algorithm that is clearly doomed to failure or un-implementable. Don't attack a problem that is irrelevant, ill-defined or unsolvable.
  • Be honest! You are not being marked on how good the results are. It doesn't matter if your method is worse than the ones you compare to. What matters is that you try something sensible, clearly describe the problem, your method, what you did, and what the results were.
  • Be modest! Don't pick a project that is way too hard. Usually, if you select the simplest thing you can think of to try, and do it carefully, it will take much longer than you think.
  • Have fun! If you pick something you think is cool, that will make getting it to work less painful and writing up your results less boring.

    Suggested Projects

    1. Train a Deep Boltzmann Machine with two hidden layers on a set of binary vectors. Investigate the effect of pretraining on the speed of training and on the quality of the examples generated by the trained DBM. This project is mainly about implementing the rather complicated algorithm described in the reading for that lecture.

    2. Compare different energy functions for modeling image patches using contrastive backpropagation. First, replicate the model in the notes for lecture 3b that uses two layers of logistic hidden units to learn to model a two-dimensional density composed of four squares that contain the data. The hybrid Monte Carlo method that you will need to use is explained in “Probabilistic inference using Markov chain Monte Carlo methods”, Neal (1993). For this example, it is probably sufficient to use CD1 without any repetitions of the choice of random momentum in the trajectory that is used to get the “negative” data. Once your code works on this toy problem, try using contrastive backpropagation with multi-layer feedforward neural nets of various designs with various energy functions to learn a model of the 8x8 image patches that can be found at:
    http://www.cs.toronto.edu/~hinton/data/patches.mat
    You might need to use a relatively small training set with a relatively small feedforward net in order to get your experiments finished in time. Your report should discuss the effects of using different feed-forward architectures, different energy functions and different versions of the training procedure. You obviously do not have time to systematically explore all of these variations so it would be sufficient to have one “standard” model and to report the effects of one sensible variation of the network architecture, one sensible variation of the training procedure and one sensible variation of the energy function.