Question

First a theoretical question and then a practical one.

Neural nets backpropagation is the computation of the weights derivatives or the computation of the new weights (that is, the original weights minus the weight derivatives times the learning rate - simplified)?

It may well be a semantics issue but important nevertheless.

Also, if anyone is familiar with Torch, nn class

gradInput = module:backward(input, gradOutput)

is gradinput the weight set for the next forward pass or is it the derivatives of the weights of the previous forward pass?

Thanks!

Was it helpful?

Solution

I have been using torch for a few months now but I will give it a go (apologies if incorrect).

Yes a weight $w$ is updated as follows;

$$ w_{new} = w_{old} - \gamma \partial E/ \partial w_{old} $$

where $ \gamma $ is your learning rate and $E$ is the error calculated using something like criterion:forward(output,target). The criterion could be, for example, nn.MSECriterion().

To calculate $\partial E/\partial W$ you need $\partial E/\partial y$ gradOutput = criterion:backward(output,target)(gradient respect to output) as well as the input input to the net i.e. your $X$ (e.g. image data) to generate the recursive set of equations which multiply with gradOutput.

model:backward(input, gradOutput) therefore serves to update the weights so that they are ready for the next model:forward(input) as it generates a big derivative tensor $dE/dW_{old}$.

This is then combined with a optimiser such as optim.sgd using optimMethod and the old weights $W_{old}$ to generate the new weights in the first equation. Of course you can just update the weights without an optimiser with model:updateParameters(learningRate) but you miss useful stuff like momentum, weight decay etc.

Got a bit side tracked there but hope this helps.

OTHER TIPS

In the simplest view, maybe this will suffice: the backward() method is used for training a neural network with backpropagation; compute the output y given input x using your network's forward() method, then find out the error of the output from your target using the criterion you defined (e.g. negative log likelihood etc.). Now, if there were only one layer of network, then you could simply use those errors between the output layer and the target to update the weights in that single layer. When you have more than one layer (or more complex structure), you could update layers one at a time, each time computing the error in that layer (no longer the output layer), then using that error to update weights for a previous layer. That is backpropogation. For this, you obviously need some way to map the error in the output $\Delta y$ onto $\Delta x$ using the same state (a.k.a. model state and input $x$). Thus, the backward() method is essentially in the form: $$f : (x, \Delta y) \rightarrow \Delta x$$

For the same of completeness, a forward() method can be represented as: $$f:x \rightarrow y$$ Alternatively, if the previous state of y persists, then all you need to calculate is $\Delta y$, thus, equivalently speaking, a forward() can also be represented as: $$ f: (x, y) \rightarrow \Delta y$$

This form can be easily compared to the backward() method, and it is easy to see why it is called as such.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top