Autoencoders, Minimum Description Length and Helmholtz Free
Energy
Geoffrey E. Hinton
Department of Computer Science
University of Toronto
and
Richard S. Zemel
Computational Neuroscience Laboratory
The Salk Institute
Abstract
An autoencoder network uses a set of recognition
weights to convert an input vector into a code vector. It then uses a set of generative
weights to convert the code vector into an approximate reconstruction of the input vector.
We derive an objective function for training autoencoders based on the minimum
Description Length (MDL) principle. The aim is to minimize the information required
to describe both the code vector and the reconstruction error. We show that this
information is minimized by choosing code vectors stochastically according to a Boltzmann
distribution, where the generative weights define the energy of each possible code vector
given the input vector. Unfortunately, if the code vectors use distributed
representations, it is exponentially expensive to compute this Boltzmann distribution
because it involves all possible code vectors. We show that the recognition weights
of an autoencoder can be used to compute an approximation to the Boltzmann distribution
and that this approximation gives an upper bound on the description length. Even
when this bound is poor, it can be used as a Lyapunov function for learning both the
generative and the recognition weights. We demonstrate that this approach can be
used to learn factorial codes.
Download [Postscript] [pdf]
Advances in Neural Information Processing Systems 6. D.S.
Touretzky, M.C. Mozer and M.E. Hasselmo. MIT Press.
[home page] [publications]