Question

I've been trying to solve this for a bit and really just haven't seen an example or anything that my brain is able to use to move forward.

The goal is to find a model Gaussian curve by minimizing the total chi-squared between the real data and the model resulting from unknown parameters that require sensible estimations (the Gaussian is of unknown position, amplitude and width). scipy.optimize.fmin has come up but I've never used this before and I'm still very new to python...

Ultimately, I'd like to plot the original data along with the model - I have use pyplot before, it's just generating the model and using fmin that has me completely bewildered where I'm essentially here:

def gaussian(a, b, c, x):
    return a*np.exp(-(x-b)**2/(2*c**2))

I've seen multiple ways to generate a model and this has rendered me confused and thus I have no code! I have imported my data file through np.loadtxt.

Thanks for anyone that can suggest a framework or help at all.

Was it helpful?

Solution

There are basically four (or five) main steps involved in model fitting problems like this:

  1. Define your forward model, yhat = F(P, x), that takes a set of parameters P and your independent variable x, and estimates your response variable y
  2. Define your loss function, loss = L(P, x, y) that you'd like to minimize over your parameters
  3. Optional: define a function that returns the Jacobian matrix, i.e. the partial derivatives of your loss function w.r.t. your model parameters.*
  4. Make an initial guess at your model parameters
  5. Plug all these into one of the optimizers and get the fitted parameters for your model

Here's a worked example to get you started:

import numpy as np
from scipy.optimize import minimize
from matplotlib import pyplot as pp

# function that defines the model we're fitting
def gaussian(P, x):
    a, b, c = P
    return a*np.exp(-(x-b)**2 /( 2*c**2))

# objective function to minimize
def loss(P, x, y):
    yhat = gaussian(P, x)
    return ((y - yhat)**2).sum()

# generate a gaussian distribution with known parameters
amp = 1.3543
pos = 64.546
var = 12.234
P_real = np.array([amp, pos, var])

# we use the vector of real parameters to generate our fake data
x = np.arange(100)
y = gaussian(P_real, x)
# add some gaussian noise to make things harder
y_noisy = y + np.random.randn(y.size)*0.5

# minimize needs an initial guess at the model parameters
P_guess = np.array([1, 50, 25])

# minimize provides a unified interface to all of scipy's solvers. you
# can also access them individually in scipy.optimize, but the
# standalone versions have annoying differences in their syntax. for now
# we'll use the Nelder-Mead solver, which doesn't use the Jacobian. we
# also need to hand it x and y_noisy as additional args to loss()
res = minimize(loss, P_guess, method='Nelder-Mead', args=(x, y_noisy))

# res is a dict containing the results of the optimization. in particular we
# want the optimized model parameters:
P_fit = res['x']

# we can pass these to gaussian() to evaluate our fitted model
y_fit = gaussian(P_fit, x)

# now let's plot the results:
fig, ax = pp.subplots(1,1)
ax.hold(True)
ax.plot(x, y, '-r', lw=2, label='Real')
ax.plot(x, y_noisy, '-k', alpha=0.5, label='Noisy')
ax.plot(x, y_fit, '--b', lw=5, label='Fit')
ax.legend(loc=0, fancybox=True)

enter image description here

*Some solvers, e.g. conjugate gradient methods, take the Jacobian as an additional argument, and by and large these solvers are faster and more robust, but if you're feeling lazy and performance isn't all that critical then you can usually get away without providing the Jacobian, in which case it will use the finite differences method to estimate the gradients.

You can read more about the different solvers here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top