Im researching about MultiLayer Perceptrons, a kind of Neural Networks. When I read about Back Propagation Algorithm I see some authors suggest to update weights inmediately after we computed all errors for specific layer, but another authors explain we need to update weights after we get all errors for all layers. What are correct approach?

1st Approach:

function void BackPropagate(){
    ComputeErrorsForOutputLayer();
    UpdateWeightsOutputLayer();
    ComputeErrorsForHiddenLayer();
    UpdateWeightsHiddenLayer();
}

2nd Approach:

function void BackPropagate(){
    ComputeErrorsForOutputLayer();
    ComputeErrorsForHiddenLayer();
    UpdateWeightsOutputLayer();
    UpdateWeightsHiddenLayer();
}

Thanks for everything.

有帮助吗?

解决方案

I am pretty sure that you have misunderstood the concept here. Two possible strategies are:

  • update weights after all errors for one input vector are calculated
  • update weights after all errors for all the input vectors are calculated

which is completely different from what you have written. These two method are sample/batch strategies, both having their pros and cons, due to simplicity the first approach is much more common in implementations.

Regarding your "methods", second method is the only correct one, process of "propagating" the error is just a computational simplification of computing derivative of error function, and the (basic) process of learning is a steepest descent method. If you compute the derivative only for part of dimensions (output layer), perform a step in the direction, and then recalculate the error derivatives according to new values, you are not performing a gradient descent. The only scenario, where first method is acceptable is when your weights update do not interfer with your error computation, then it does not matter what order is used, as they are independent.

其他提示

@lejlot's answer is entirely correct

Your question is acctually refering to the two main approaches:


Batch backpropagation

Update weights after all errors for all the input vectors are calculated.

Online backpropagation

Update weights after all errors for one input vector are calculated.

There is a third method called Stochastic backpropagation, which is really just an online backpropagation with a random selection training pattern sequence.




Time Complexity

On average, the batch backpropagation method is the fastest one to converge - but the most difficult to implement. See a simple comparison here.




It is not possible to alter the weights of the output layer before computing the delta for layer below:

Here you can see the mathmatical equation for calculating the derivative 
of the Error with respect to the weights. (using Sidmoid)
O_i = the layer below   # ex: input
O_k = the current layer # ex: hidden layer
O_o = the layer above   # ex: output layer

enter image description here

As you can see, the dE/dW depends on the weights of the layer above. 
So you may not alter them before calculating the deltas for each layer.

The question is different to choose between batch or online backpropagation.

Your question is legitimate one and I think that both approaches are good. The both approches are almost similar on many epochs but the 2nd looks just a little better even if everyone use the 1st.

PS : The 2nd approch works only on online backpropagation

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top