How does fminunc optimise the learning rate (step proportion) value in gradient descent?

https://stackoverflow.com/questions/13998880

11-12-2021
|

문제

I currently work on a machine learning algorithm and I noticed that when I use Matlab's fminunc the algorithm converges to the global minimum very fast (few iterations) comparing to when I manually update the parameters:

thetas[j] = thetas[j] - (alpha*gradient)/sampleNum;

I think it's because I naively presume alpha (step proportion) to be constant.

So, how does one implement something like fminunc in C?

I tried to start with a large alpha and adjust it if the current cost turns out to be larger than the previous cost. The problem with this comes when the shape of the minimised function is not linear, since alpha can get a very small value initially and fail to return to a larger one when the function shape tends to become 'flat' (and larger steps could be taken).

해결책

Matlab's fminunc doesn't actually use gradient descent, but rather Newton-like methods (BFGS-based quasi-Newton or trust-region depending on the problem size), which are in general significantly faster than gradient descent, no matter how you choose the step size.

Maybe you should look into this kind of methods if you want faster convergence.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow