Question

I was doing some research on how backpropagation works? I read that, backpropagation is used to find the optimal weight of each neuron after every iteration using partial derivates and updates the weights of the neuron.

On the other hand, we have hyperparameter called 'learning-rate' used to update the weight of the neuron in each iteration by calculating the direction of the error.

These two cases look like working independently, I mean, while backpropagation algorithm itself finding the optimal weight, we do not need a learning rate parameter itself.

Is my understanding correct? Please correct me if I am wrong.

Was it helpful?

Solution

Using backpropagation is nothing else than performing (stochastic) gradient descent.

It computes the gradient, but it is not the "optimal" weight. The gradient is used to update the current weight (according to the gradient descent algorithm).

The gradient descent algorithm needs a step size (which is called learning rate in the context of machine learning).

The step size defines how "strong" the current weights are updated by the current gradient.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top