Вопрос

I am having issues with the numerical accuracy of scipy.optimize.curve_fit function in python. It seems to me that I can only get ~ 8 digits of accuracy when I desire ~ 15 digits. I have some data (at this point artificially created) made from the following data creation:

enter image description here

where term 1 ~ 10^-3, term 2 ~ 10^-6, and term 3 is ~ 10^-11. In the data, I vary A randomly (it is a Gaussian error). I then try to fit this to a model:

enter image description here

where lambda is a constant, and I only fit alpha (it is a parameter in the function). Now what I would expect is to see a linear relationship between alpha and A because terms 1 and 2 in the data creation are also in the model, so they should cancel perfectly;

enter image description here

So;

enter image description here

However, what happens is for small A (~10^-11 and below), alpha does not scale with A, that is to say, as A gets smaller and smaller, alpha levels out and remains constant.

For reference, I call the following: op, pcov = scipy.optimize.curve_fit(model, xdata, ydata, p0=None, sigma=sig)

My first thought was that I was not using double precision, but I am pretty sure that python automatically creates numbers in double precision. Then I thought it was an issue with the documentation perhaps that cuts off the digits? Anyways, I could put my code in here but it is sort of complicated. Is there a way to ensure that the curve fitting function saves my digits?

Thank you so much for your help!

EDIT: The below is my code:

# Import proper packages
import numpy as np
import numpy.random as npr
import scipy as sp
import scipy.constants as spc
import scipy.optimize as spo
from matplotlib import pyplot as plt
from numpy import ndarray as nda
from decimal import *

# Declare global variables
AU = 149597871000.0
test_lambda = 20*AU
M_Sun = (1.98855*(sp.power(10.0,30.0)))
M_Jupiter = (M_Sun/1047.3486)
test_jupiter_mass = M_Jupiter
test_sun_mass = M_Sun
rad_jup = 5.2*AU
ran = np.linspace(AU, 100*AU, num=100)
delta_a = np.power(10.0, -11.0)
chi_limit = 118.498

# Model acceleration of the spacecraft from the sun (with Yukawa term)
def model1(distance, A):
    return (spc.G)*(M_Sun/(distance**2.0))*(1 +A*(np.exp(-distance/test_lambda))) + (spc.G)*(M_Jupiter*distance)/((distance**2.0 + rad_jup**2.0)**(3.0/2.0)) 

# Function that creates a data point for test 1
def data1(distance, dela):
    return (spc.G)*(M_Sun/(distance**2.0) + (M_Jupiter*distance)/((distance**2.0 + rad_jup**2.0)**(3.0/2.0))) + dela

# Generates a list of 100 data sets varying by ~&a for test 1
def generate_data1():
    data_list = []
    for i in range(100):
        acc_lst = []
        for dist in ran:
            x = data1(dist, npr.normal(0, delta_a))
            acc_lst.append(x)
        data_list.append(acc_lst)
    return data_list

# Generates a list of standard deviations at each distance from the sun. Since &a is constant, the standard deviation of each point is constant
def generate_sig():
    sig = []
    for i in range(100):
        sig.append(delta_a)
    return sig

# Finds alpha for test 1, since we vary &a in test 1, we need to generate new data for each time we find alpha
def find_alpha1(data_list, sig):
    alphas = []
    for data in data_list:
        op, pcov = spo.curve_fit(model1, ran, data, p0=None, sigma=sig)
        alphas.append(op[0])
    return alphas

# Tests the dependence of alpha on &a and plots the dependence
def test1():
    global delta_a
    global test_lambda
    test_lambda = 20*AU
    delta_a = 10.0**-20.0
    alphas = []
    delta_as = []
    for i in range(20):
        print i
        data_list = generate_data1()
        print np.array(data_list[0])
        sig = generate_sig()
        alpha = find_alpha1(data_list, sig)    
        delas = []
        for alp in alpha:
            if alp < 0:
                x = 0
                plt.loglog(delta_a, abs(alp), '.' 'r')
            else: 
                x = 0
                plt.loglog(delta_a, alp, '.' 'b')
        delta_a *= 10
    plt.xlabel('Delta A')
    plt.ylabel('Alpha (at Lambda = 5 AU)')
    plt.show()

def main():
    test1()

if __name__ == '__main__':
    main()
Это было полезно?

Решение

I believe this is to do with the minimisation algorithm used here, and the maximum obtainable precision.

I remember reading about it in numerical recipes a few years ago, I'll see if i can dig up a reference for you.

edit:

link to numerical recipes here - skip down to page 394 and then read that chapter. Note the third paragraph on page 404:

"Indulge us a final reminder that tol should generally be no smaller than the square root of your machine’s floating-point precision."

And mathematica mention that if you want accuracy, then you need to go for a different method, and that they don't infact use LMA unless the problem is recognised as being a sum of squares problem.

Given that you're just doing a one dimensional fit, it might be a good exercise to try just implementing one of the fitting algorithms they mention in that chapter.

What are you actually trying to achieve though? From what i understand about it, you're essentially trying to work out the amount of random noise you've added to the curve. But then that's not really what you're doing - unless i've understood wrong...

Edit2:

So after reading how you generate the data, there's an issue with the data and the model you're applying.

You're essentially fitting the two sides of this:

enter image description here

You're essentially trying to fit the height of a gaussian to random numbers. You're not fitting the gaussian to the frequency of those numbers.

Looking at your code, and judging from what you've said, this isn't you end goal, and you're just wanting to get used to the optimise method?

It would make more sense if you randomly adjusted the distance from the sun, and then fit to the data and see if you can minimise to find the distance which generated the data set?

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top