bluelogo.gif (1643 bytes)

home page

Training Products of Experts by Minimizing Contrastive Divergence

Geoffrey E. Hinton
Gatsby Computational Neuroscience Unit
University College London

GCNU TR 2000-004

It is possible to combine multiple probabilistic models of the same data by multiplying the probabilities together and then renormalizing. This is a very efficient way to model high-dimensional data which simultaneously satisfies many different low-dimensional constraints because each individual expert model can focus on giving high probability to data vectors that satisfy just one of the constraints.  Data vectors that satisfy this one constraint but violate other constraints will be ruled out by their low probability under the other expert models. Training a product of experts appears difficult because, in addition to maximizing the probabilities that each individual expert assigns  to the observed data, it is necessary to make the experts be as different as possible.  This ensures that the product of their distributions is small which allows the renormalization to magnify the probability of the data under the product of experts model.   Fortunately, if the individual experts  are tractable there is a fairly efficient way to train a product of experts.

Download:  [ps.gz] or [pdf]

[home page]  [publications]