Minimizing Description Length in an Unsupervised Neural Network
Geoffrey E. Hinton
Department of Computer Science
University of Toronto
Richard Zemel
Computational Neurobiology Laboratory
The Salk Institute
La Jolla
Abstract
An autoencoder network uses a set of recognition
weights to convert an input vector into a representation vector. It then uses a set
of generative weights to convert the representation vector into an approximate
reconstruction of the input vector. We derive an objective function for training
autoencoders based on the Minimum Description Length (MDL) principle. The aim is to
minimize the information required to describe both the representation vector and the
reconstruction error. This information is minimized by choosing representation
vectors stochastically according to a Boltzmann distribution. Unfortunately, if the
representation vectors use distributed representations, it is exponentially expensive to
compute this Boltzmann distribution because it involves all possible representation
vectors. We show that the recognition weights of an autoencoder can be used to
compute an approximation to the Boltzmann distribution. This approximation
corresponds to using a suboptimal encoding scheme and therefore gives an upper bound on
the minimal description length. Even when this bound is poor, it can be used as a
Lyapunov function for learning both the generative and the recognition weights. We
demonstrate that this approach can be used to learn distributed representations in
which many different hidden causes combine to produce each observed data vector.
Such representations can be exponentially more efficient in their use of hardware than
standard vector quantization or mixture models.
Download: [ps] [[pdf]
[home page] [publications]