Here is an old email exchange from 2008.  It shows that if you have a
good idea you should publish it!
-----------------------------------------

ec@cs.brown.edu via cs.toronto.edu
7/9/08

to hinton
Geoff,

In your lecture at Brown you showed some results at the end where
you represented individual words as a set of independent
variables.  I think the idea is that "reading" and "read" might
be close according to one variable, while "reading" and "writing"
might be close according to another.  Would it be possible to
see this data?

Eugene


Geoffrey Hinton <geoffrey.hinton@gmail.com>
Attachments7/10/08

to Andriy, ec
Each word is converted in to a vector of 100 real values in such a way
that the vectors for the previous n words are good at predicting the
vector for the next word.
We spent a while looking at the vectors.  If you display them in 2-D
using one of our recent dimensionality reduction methods that keeps
very similar vectors very close, you get cute pictures.

Here is a paper describing how the features are obtained and a paper
showing the vectors in 2-D. We now have even nicer pictures but I cant
find them!

We tried looking at the vectors but couldnt understand the individual
features. I could ask the grad student to send you the 17000 100-D
vectors if you want.
Let me know.

However, it can do analogies in a very dumb way.
To answer A:B = C:?  it takes the 100-D vector difference  B-A and
adds it to C. Then it finds the closest vector that isnt A or B or C.

So it can do is:was = are:?  correctly (i.e it says "were")

Geoff