Why does solution go to the right directions all the time with the gradient descent algorithm?

https://stackoverflow.com/questions/15300693

18-03-2022
|

문제

I have a question with the classical gradient descent algorithm. Recently I want to implement a function fitting program provided the observed data as well as the parameters of the function is given. The method I have used is the gradient descent algorithm as the derivatives of the function are available. Suppose the function parameters are given, I can create the simulated data based on the function:

clean;
rng('default');
rng(54321);
low_value = 15;
high_value = 200;
dis_value  = (high_value-low_value)/2;
central_value = (low_value+high_value)/2;
x = 1:55;
central_pixel = (1+length(x))/2;
delta = 3;
len = length(x);
y_true = dis_value*erf((x-central_pixel)./delta)+central_value;
y = y_true + randn(1,len);
figure;plot(x,y,'b*');
hold on; plot(x,y_true,'r');

The figure below shows the simulated data (blur asterisk points )as well as the potential function (in red): enter image description here As you can see from the figure there are two parameters to estimate in this example, one is the center point position p and the other is the standard derivation delta, and the function is written as

enter image description here

where A and B can be regarded as known. Then if I want to use gradient descent algorithm, I should do two things: one is define the derivatives of the function for the two unknown parameters (p and delta) respectively, and the other is to invoke the gradient descent algorithm. However, what I have found confusing is that during the iterative procedure one of the estimated parameters (delta) does not go to the right direction all the time:

enter image description here

I do notice, however, that object function (the sum of the square distance between the fitting point and the estimated function) is always becoming smaller: enter image description here

My question is then why the solutions do not always go to the right direction even though in the end they can reach the right place. Thanks!

해결책

A gradient descent minimizer just follows the negative gradient of the function at its current location. You (probably) gave the minimizer starting (p,delta) values where the negative gradient had delta increasing. For different starting values, you will likely see different behavior.

One thought, though. Because erfc( (x-p)/delta ) has the ratio of p/delta, your parameters are somewhat interdependent. That is, for small values of x, or large values of p/delta, the ratio will dominate, and minimizers can get stuck in a cycle of continually increasing the magnitude of the numerator and the denominator.

Try using modified parameters: erfc( x/delta - pdratio ), and fitting to delta and pdratio. You can then back out the original p parameter: p = pdratio*delta.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow