There are several problems with the approach you are describing.
First, from what you have described, there really is no simplification of the feed-forward code or the backpropagation code. You are just combining three weight matrices into one, which allows the feedforward
and costfn
functions to take fewer arguments but you still have to unpack W
inside those functions to implement the forward and backpropagation logic. The feedforward and backpropagation logic requires evaluation of the activation function and its derivative in each layer so you can't represent this as a simple matrix multiplication.
The second issue is that you are constraining the structure of your neural network by packing three weight matrices into one by appending columns. The number or rows and columns in the weight matrix correspond to the number of neurons and inputs in the layer, respectively. Suppose you have M
inputs to your network and N
neurons in the first layer. Then W1
will have shape (N, M)
. In general, for a fully-connected network, layer two weights (W2
) will have shape (K, N)
, where N
is the number of inputs (which is constrained by the number of outputs from the first layer) and K
is the number of neurons in the second layer.
The problem is that since you are creating one combined weight matrix by appending columns, K
(the number of rows in the second weight matrix) will have to be the same as the number of rows/neurons from the first layer, and so on for successive layers. In other words, your network will have shape M x N x N x N
(M
inputs, then N
neurons in each layer). This is a bad constraint to have on your network since you typically don't want the same numbers of neurons in the hidden layers as in the output layer.
Note that for simplification, I have ignored bias inputs but the same issues exist even if they are included.