문제

I came across this link. In BFGS Quasi-Newton Method, a Hessian matrix is used in weight updation. Is there any resource where I can find how this hessian matrix was obtained along with a clear description of the process, as to why Hessian matrix has been taken? I could not understand the wiki article.

도움이 되었습니까?

해결책

In optimization, you are looking for minima and maxima of a certain function f. Equivalently, under certain derivative conditions, you are looking for the zeros of g, the derivative of f.

The first equation of your link describes one step of the Newton Method - not BFGS. It involves the Hessian of f which is not accessible in practice - it's ineficcient in terms of time and memory storage. Due to this fact, some have developed methods not to calculate the hessian directly.

Thanks to Taylor expansion, you can write

g(x) = g(y) + g'(y)(x-y) + Θ(|x-y|²)

Replacing x by x(k+1) and y by x(k), and g' by the Hessian of f gives you (with the following convention : g(k+1) = g( x(k+1) )

g(k+1) ≈ g(k) + ∇²f(x(k+1)) ( x(k+1) - x(k) )

H(k+1) (x(k+1) - x(k)) ≈ g(k+1) - g(k) where H(k) aims to approximate ∇²f(x(k)). But instead of calculating it at each step, we'll try to approximate it thanks to the previous H(k-1)

In 1970, Broyden, Fletcher, Goldfard and Shamo proposed a rank 2 update of H(k), ie

H(k+1) = H(k) + Term1 + Term2

where Term1 does not depend on H and Term2 depends on H(k). You can find these terms here.

But this solves only the time issue, not the memory storage. As for that, some have developed Limited-memory BFGS (or L-BFGS) to limit the amount of memory used.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top