Domanda

I have two different sets of randomly distributed experimental data. I need to make one of the distributions as much similar to another as possible by applying some function to each of its values. Example of function: F(x) = x*(1+(x+p1)*p2, where p1 and p2 are some arbitrary parameters. To find out whether it is possible and, if it is, then with which values of p1 and p2, I wrote a simple python script:

#!/usr/bin/python
from scipy.stats import ks_2samp
from frange import frange
control = [float(i.rstrip().replace(',', '.')) for i in open('control.txt').readlines()]
test = [float(i.rstrip().replace(',', '.')) for i in open('1460.txt').readlines()]
def mean(x):
    res = sum(x)/len(x)
    return res
def testargs(p1, p2):
    model = [i*(1+(i+p1)*p2) for i in control]
    if round(mean(model), 4) == round(mean(test), 4):
        return True
    else:
        return False
results = {}
for p1 in frange(0, 0.02, 0.001):
    for p2 in frange(5, 20, 0.01):
        if testargs(p1, p2):
            ks = ks_2samp([i*(1+(i+p1)*p2) for i in control], test)[1]
            results[ks] = (p1, p2)
result = sorted(results.keys(), reverse=True)[0]
print('Result: ', result, '\n', 'p1, p2: ', results[result], '\n')

I understand that of all possible ways this is the ugliest and the slowest one. Unfortunately, I have no programming background at all and this is my first humble effort. Given that the mean value of the resulting distribution is a khown constant, the number of appropriate p1-p2 pairs is very limited, but I use a simple brute force here. I think, there should be some way to express p2 as a function of p1, but I have absolutely no idea how to do it. Maybe you can throw some thought at me?
Sorry for my bad English...

È stato utile?

Soluzione

scipy.optimize is your friend, here.

What you would typically do is to create a function that takes two parameters (p1, p2) and returns a value indicating how far the two distributions (test and modified control) are from each other; in your case, this can be (mean(model)-mean(test))**2. SciPy minimization functions give you the parameters (p1, p2) that minimize the distance between your two distributions.

You might want to try a few of the minimization functions that SciPy offers: some work better than others, depending on the problem.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top