Question of using gradient descent instead of calculus. I checked previous questions there are still points to clarify

https://datascience.stackexchange.com/questions/57466

02-11-2019
|

Pergunta

First of all I checked http://stats.stackexchange.com/questions/23128/solving-for-regression-parameters-in-closed-form-vs-gradient-descent, http://stackoverflow.com/questions/26804656/why-do-we-use-gradient-descent-in-linear-regression, https://stats.stackexchange.com/questions/212619/why-is-gradient-descent-required but couldn't find my answer.

Gradient descent is: $w_{i}:=w_{i}-\alpha \frac{\delta }{\delta w_{i}}j(w)$ where w is a vector.

At his book "Pattern Recognition and Machine Learning" Bishop says:

"Because the error function is a quadratic function of the coefficients w, its derivatives with respect to the coefficients will be linear in the elements of w, and so the minimization of the error function has a unique solution..."

So if we take derivative of $j(w)$ with respect to $w_{i}$ and equal to zero, in the end it will give us the smallest $w$. Which is actually the first exercise.

In gradient descent we take the derivative too, so the problem can't be because of derivative. Such as an equation that its derivative can't be found. If we can find the answer with one iteration(making equation equal to zero for every feature) why do we iterate over and over again instead, which is the case of gradient descent.

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange