Question

In my Neural Network I have combined all of the weight matrices into one large matrix: e.g A 3 layer matrix usually has 3 weight matrices W1, W2, W3, one for each layer. I have created one large weight matrix called W, where W2 and W3 are appended onto the end of W1. If W1 has 3 columns, W2 has 3 columns, and W3 has 2 columns, my matrix W will have 8 columns.

The number of layers/ Number of Inputs/Outputs is stored as a global variable.

This means I can use the feedforward code with only 2 input arguments, as the feedforward code splits W up into W1, W2, W3...etc, inside the function.

Output_of_Neural_Net = feedforward(Input_to_Neural_Net,W)

I also store the training data as a global variable. This means I can use the cost function with only one input argument.

cost = costfn(W)

The purpose of this is so that I can use built in MATLAB functions to minimise the cost function and therefore obtain the W that gives the network that best approximates my training data.

I have tried fminsearch(@costfn,W) and fminunc(@costfn,W). Both give mediocre results for the function I am trying to approximate, although fminunc is slightly better.

I now want to try Back-Propagation to train this network, to see if it does a better job, however most implementations of this are for networks with multiple weight matrices, making it more complicated.

My question is: Will I be able to implement back propagation with my single appended weight matrix, and how can I do this?

I feel like using a single weight matrix should make the code simpler, but I can't work out how to implement it, as all other examples I have seen are for multiple weight matrices.

Additional Information

The network will be a function approximator with between 8 and 30 inputs, and 3 outputs. The function it is approximating is quite complicated and involves the inverse of elliptic integrals (and so has no analytical solution). The inputs and outputs of the network will be normalised so that are between 0 and 1.

Was it helpful?

Solution

There are several problems with the approach you are describing.

First, from what you have described, there really is no simplification of the feed-forward code or the backpropagation code. You are just combining three weight matrices into one, which allows the feedforward and costfn functions to take fewer arguments but you still have to unpack W inside those functions to implement the forward and backpropagation logic. The feedforward and backpropagation logic requires evaluation of the activation function and its derivative in each layer so you can't represent this as a simple matrix multiplication.

The second issue is that you are constraining the structure of your neural network by packing three weight matrices into one by appending columns. The number or rows and columns in the weight matrix correspond to the number of neurons and inputs in the layer, respectively. Suppose you have M inputs to your network and N neurons in the first layer. Then W1 will have shape (N, M). In general, for a fully-connected network, layer two weights (W2) will have shape (K, N), where N is the number of inputs (which is constrained by the number of outputs from the first layer) and K is the number of neurons in the second layer.

The problem is that since you are creating one combined weight matrix by appending columns, K (the number of rows in the second weight matrix) will have to be the same as the number of rows/neurons from the first layer, and so on for successive layers. In other words, your network will have shape M x N x N x N (M inputs, then N neurons in each layer). This is a bad constraint to have on your network since you typically don't want the same numbers of neurons in the hidden layers as in the output layer.

Note that for simplification, I have ignored bias inputs but the same issues exist even if they are included.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top