The MLib library for creating convolutional networks

download

Download and extract the files in some directory, and let matlab know its path.
Eventually the source will become available for download.
This is a preliminary version so there could be bugs.

A description of convolutional neural networks

The library provides the function mul for multiplication, mult for multiplication by the transpose and outp for outer products. In addition, makeWB is a function for creating the weight matrices.

Note

The library is implemented in C; however, it does not use any sophisticated methods of matrix multiplication. In particular, in the case where there is only one local field, the resulting computation will take \Omega(n^3) running time. Another source of inefficiency is that during the multiplication, the required array address is computed for every indexing operation, yielding big constant factors.

Examples

Suppose that we are given an image of size 30 X 30. We would like to use a matrix with local fields of size 6 X 6, with a horizontal and vertical spacing of sizes 2, but we would not like it to be convolutional: the local fields may differ. So two nearby local fields have overlap of size 6 X 4 or 4 X 6. Suppose that, in addition, we want 7 outputs per local field. To implement this, we type the following into matlab:

I.xSize = [30 30];   % size of input vector
I.Size  = [6  6];    % size of local field
I.Step  = [2  2];    % the spacing
I.Conv  = [0  0];    % no convolution along any dimensions
I.k     = 7;         % number of neurons per local field.
[W I] = makeWB(I, @randn);

We now created the matrix W. We could have used @zeros, to create a matrix with zeros in its entries.

After creating W, it is safer not to modify the entries of I. A particularly useful field of I that is created by makeWB is I.ySize, the size of the output matrix.

We can now look inside the matrix W. Note that W is not a special matlab object. Its just a high dimensional array. So things like W+1 are valid in matlab.

>> size(W)
ans =
    13    13     7     6     6
The first three two dimensions index the local field. The first two represent the position of the local field, and the third dimension represents the neuron to whcih the local field belongs. The last two dimensions represent position within the local field.

If it is impossible to tile the entire image with the given field size and step size a warning will be issued. Such a situation may occur if we use local fields of odd size (i.e., I.Size = [7 7]), with even step size (i.e, I.Step = [2 2]) and even image sizes (i.e, I.xSize=[30 30]).
Now we can easily multiply, multiply by the transpose and take outer product.

X = rand(30,30);
Y = mul (X, W, I);
X1= mult(Y, W, I);
O = outp(X, Y, I);

The dimensions of Y are

>> size(Y)
ans =
    13    13     7
Thus, we have 7 elements for each possible placement of the local field in the image. In genreal, both X and the local field are n-dimensional. We specify an n dimensinoal spacing vector which describes how the local field are to located. For each position there are several outputs, thus several distinct local fields, which correspond to the last dimension of y.

Suppose that we want to use a convolutional network instead, with the same spacing of the local fields. Then all we need to do is to set in the initialization

I.Conv = [1 1]
This means that we want convolution along all dimensions. We could've written
I.Conv = [1 0]
so that we got convolution only along the y coordinate.

For a concrete example, consider

I.xSize = [30 30];
I.Size  = [6  6];
I.Step  = [2  2];
I.Conv  = [1  1];
[W I] = makeWB(I, @randn);
Then we created a random convolutional network. It is used exactly as before, except that we have
>> size(W)
ans =
     1     1     7     6     6
So we use the same weights for all the local fields. This is the only difference between convolutional and non-convolutional networks. If written carefully, the rest of the program should not notice any difference.



Suppose that the above matrix W was produced by a learning algorithm, and we would like to apply it on a 100X100 image. Being a convolutional network there is no conceptual problem in applying it on larger inputs.
The way to do it is to define a new I, namely

I_new = I;
I_new.xSize = [100 100];   % size of input vector
[junk_W I_new] = makeWB(I, @randn);

And to use the W that we have together with I_new. So for example, we could now have the following:
X = rand(100,100);
Y = mul(X, W, I_new); 
And this code will work as expected. The reason that we cannot just write I.xSize = [100 100] for the old I that we have is that it contains more information, so it is safer to use makeWB and to throw away the junk_W it produces.

Example: The Spiking Boltzmann Machine

The Spiking Boltzmann Machine (*) is a big RBM whose weight matrix is convolutional in time, allowing it catpure temporal regularities.

If the SBM has a 30 by 30 image for each time step, and each hidden unit is allowed to observe the visible variables from 5 time steps, we write

I.xSize = [30 30 T];
I.Size  = [30 30 5];
I.Step  = [1  1  1];
I.Conv  = [0  0  1];
[W I] = makeWB(I, @randn);
Where T is the number of time frames. This means that we convolve only around the 3rd dimension, which in our case happens to be the dimensions of time.

It is also possible to modify the above example, so that the hidden units would be "sparse" -- we would throw away every second unit, so that when a multilayered network is created, it can catprue longer range regularities.

I.xSize = [30 30 T];
I.Size  = [30 30 5];
I.Step  = [1  1  2]; % <-- This means that we have twice as fewer time steps
% in the next layer, i.e. each time step in the next layer takes "more time"
I.Conv  = [0  0  1];
[W I] = makeWB(I, @randn);



home