Optimization of unusual fitting algorithm

https://stackoverflow.com/questions/11304609

18-06-2021
|

Domanda

I have two different sets of randomly distributed experimental data. I need to make one of the distributions as much similar to another as possible by applying some function to each of its values. Example of function: F(x) = x*(1+(x+p1)*p2, where p1 and p2 are some arbitrary parameters. To find out whether it is possible and, if it is, then with which values of p1 and p2, I wrote a simple python script:

#!/usr/bin/python
from scipy.stats import ks_2samp
from frange import frange
control = [float(i.rstrip().replace(',', '.')) for i in open('control.txt').readlines()]
test = [float(i.rstrip().replace(',', '.')) for i in open('1460.txt').readlines()]
def mean(x):
    res = sum(x)/len(x)
    return res
def testargs(p1, p2):
    model = [i*(1+(i+p1)*p2) for i in control]
    if round(mean(model), 4) == round(mean(test), 4):
        return True
    else:
        return False
results = {}
for p1 in frange(0, 0.02, 0.001):
    for p2 in frange(5, 20, 0.01):
        if testargs(p1, p2):
            ks = ks_2samp([i*(1+(i+p1)*p2) for i in control], test)[1]
            results[ks] = (p1, p2)
result = sorted(results.keys(), reverse=True)[0]
print('Result: ', result, '\n', 'p1, p2: ', results[result], '\n')

I understand that of all possible ways this is the ugliest and the slowest one. Unfortunately, I have no programming background at all and this is my first humble effort. Given that the mean value of the resulting distribution is a khown constant, the number of appropriate p1-p2 pairs is very limited, but I use a simple brute force here. I think, there should be some way to express p2 as a function of p1, but I have absolutely no idea how to do it. Maybe you can throw some thought at me?
Sorry for my bad English...

Soluzione

scipy.optimize is your friend, here.

What you would typically do is to create a function that takes two parameters (p1, p2) and returns a value indicating how far the two distributions (test and modified control) are from each other; in your case, this can be (mean(model)-mean(test))**2. SciPy minimization functions give you the parameters (p1, p2) that minimize the distance between your two distributions.

You might want to try a few of the minimization functions that SciPy offers: some work better than others, depending on the problem.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow