# Multiplying binary numbers

Two input images are created such that 2x4 patch in left image and 2x4
patch in corresponding region of right image represent two numbers (x
and y) in binary format.
The binary number is ordered on a row by row basis within the 2x4
patch with bit 0 (lsb) in the top left hand corner and bit 8 (msb) in
the bottom right hand corner of the image patch.
The product of these two numbers, z = x*y, varies in a smooth
fashion across the surface of the image. z is initially chosen in the
range [0,1] and then scaled by 2**15. Given z, we chose x to be a
random value between 1 and 255, and then we set y = z/x. (If y is
bigger than 255 then we chose another random value for x until a value
for y in the range [1,255] is found.) To ensure that there is no bias
in the values of x and y through this selection procedure, the values
of x and y are swapped with probability 0.5.
A three-layer network was used, with 2x4x2 inputs going to a hidden
layer of 3 tanh units and 1 output unit. Half-lives for averaging:
U=5, V=500.

The inputs to the network are the following two binary images (no
smoothing or normalisation of input patches):

The product of the two patches in each image has an "egg-box" profile
(shown on the left). Network performance after 200 epochs is shown on
the right. Final correlation between desired and actual output is
-0.994.

## Value of merit function and correlation during learning

[All images here are jpegs which accounts for the
poor quality on some of the graphs.]

Note this is a very low value of the merit function -- typically we
used to see values of around 1.8 for the disparity test and around 0.9
for the feature orientation. [It is negative since we are taking logs
and the value V/U has gone below 1.0.]

## Testing on unseen data

To test that the network is computing the sum, a new pair of images
were created, this time with the sum of the inputs varying in a
Gaussian fashion across the two images.
Inputs are just the binary values (no smoothing or normalisation of
input patches):

The product of the two patches in each image has a Gaussian profile
(shown on the left). Network performance using the network trained on
egg box data above is shown the right. Correlation between the two is
-0.977, hence the network has generalised from the egg-box data to the
gaussian data.

The network was also tested on another pair of inputs where z varied
randomly over the image (inputs not shown). Correlation between the
desired output (below left) and network output (below right) was
-0.995.

In this case, the merit function reached a stable value of -0.8 after
around just 10 epochs.
When tested on the unseen Gaussian images, correlation was very high
(-0.97).

## The weights

Here we show the weights for the 3 layer network with 5 hidden units.

For each of the five hidden units (h1...h5) you see the connections to
the eight inputs from the left patch, then the eight inputs from the
right patch and then the bias unit. For the connections to the output
unit (o1) you see the weights from the five hidden units and then the
bias weight. The weights to the third hidden unit look quite strange,
although the strength of the connection from h3 to o1 is almost zero
so that unit is probably not being used.

The network was also tested with three, rather than five, hidden
units. Although the final correlation at the end of learning was high
(0.972), it didn't generalise across to the test images very well
(When tested on Gaussian data, r=0.800. However, when tested on the
random data, r maintained a high value of 0.976).

## Two layer nets

To see if the network could learn without a hidden layer, another
network was set up with 2x4x2 inputs and one tanh output unit. The
short-range half life was varied between 1 and 6, but the network
failed to learn the egg-box data:
### Half-life 1. r = 0.375

### Half-life 3. r = -0.374

### Half-life 5. r = -0.462

### Half-life 6. r = -0.491

See also adding numbers.