Question

I've been trying to fit the amplitude, frequency and phase of a sine curve given some generated two dimensional toy data. (Code at the end)

To get estimates for the three parameters, I first perform an FFT. I use the values from the FFT as initial guesses for the actual frequency and phase and then fit for them (row by row). I wrote my code such that I input which bin of the FFT I want the frequency to be in, so I can check if the fitting is working well. But there's some pretty strange behaviour. If my input bin is say 3.1 (a non integral bin, so the FFT won't give me the right frequency) then the fit works wonderfully. But if the input bin is 3 (so the FFT outputs the exact frequency) then my fit fails, and I'm trying to understand why.

Here's the output when I give the input bins (in the X and Y direction) as 3.0 and 2.1 respectively:

(The plot on the right is data - fit) fig1

Here's the output when I give the input bins as 3.0 and 2.0: fig2

Question: Why does the non linear fit fail when I input the exact frequency of the curve?


Code:

#! /usr/bin/python

# For the purposes of this code, it's easier to think of the X-Y axes as transposed, 
# so the X axis is vertical and the Y axis is horizontal

import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as optimize
import itertools
import sys

PI = np.pi

# Function which accepts paramters to define a sin curve
# Used for the non linear fit    
def sineFit(t, a, f, p):
   return a * np.sin(2.0 * PI * f*t + p)

xSize    = 18
ySize    = 60
npt      = xSize * ySize

# Get frequency bin from user input
xFreq    = float(sys.argv[1])
yFreq    = float(sys.argv[2])

xPeriod  = xSize/xFreq
yPeriod  = ySize/yFreq

# arrays should be defined here

# Generate the 2D sine curve
for jj in range (0, xSize):
   for ii in range(0, ySize):
      sineGen[jj, ii] = np.cos(2.0*PI*(ii/xPeriod + jj/yPeriod))

# Compute 2dim FFT as well as freq bins along each axis
fftData  = np.fft.fft2(sineGen)
fftMean  = np.mean(fftData)
fftRMS   = np.std(fftData)
xFreqArr = np.fft.fftfreq(fftData.shape[1]) # Frequency bins along x
yFreqArr = np.fft.fftfreq(fftData.shape[0]) # Frequency bins along y

# Find peak of FFT, and position of peak
maxVal = np.amax(np.abs(fftData))
maxPos = np.where(np.abs(fftData) == maxVal)

# Iterate through peaks in the FFT 
# For this example, number of loops will always be only one

prevPhase = -1000
for col, row in itertools.izip(maxPos[0], maxPos[1]):

   # Initial guesses for fit parameters from FFT
   init_phase  = np.angle(fftData[col,row])
   init_amp    = 2.0 * maxVal/npt
   init_freqY  = yFreqArr[col]
   init_freqX  = xFreqArr[row]

   cntr  = 0
   if prevPhase == -1000:
      prevPhase = init_phase

   guess = [init_amp, init_freqX, prevPhase]
   # Fit each row of the 2D sine curve independently
   for rr in sineGen:   
      (amp, freq, phs), pcov = optimize.curve_fit(sineFit, xDat, rr, guess)
      # xDat is an linspace array, containing a list of numbers from 0 to xSize-1

      # Subtract fit from original data and plot
      fitData     = sineFit(xDat, amp, freq, phs)
      sub1        = rr - fitData

      # Plot
      fig1 = plt.figure()
      ax1  = fig1.add_subplot(121)
      p1,  = ax1.plot(rr, 'g')
      p2,  = ax1.plot(fitData, 'b')
      plt.legend([p1,p2], ["data", "fit"])

      ax2  = fig1.add_subplot(122)
      p3,  = ax2.plot(sub1)
      plt.legend([p3], ['residual1'])

      fig1.tight_layout()

      plt.show()
      cntr += 1
      prevPhase = phs # Update guess for phase of sine curve
Was it helpful?

Solution

I've tried to distill the important parts of your question into this answer.

  1. First of all, try fitting a single block of data, not an array. Once you are confident that your model is sufficient you can move on.
  2. Your fit is only going to be as good as your model, if you move on to something not "sine"-like you'll need to adjust accordingly.
  3. Fitting is an "art", in that the initial conditions can greatly change the convergence of the error function. In addition there may be more than one minima in your fits, so you often have to worry about the uniqueness of your proposed solution.

While you were on the right track with your FFT idea, I think your implementation wasn't quite correct. The code below should be a great toy system. It generates random data of the type f(x) = a0*sin(a1*x+a2). Sometimes a random initial guess will work, sometimes it will fail spectacularly. However, using the FFT guess for the frequency the convergence should always work for this system. An example output:

enter image description here

import numpy as np
import pylab as plt
import scipy.optimize as optimize

# This is your target function
def sineFit(t, (a, f, p)):
    return a * np.sin(2.0*np.pi*f*t + p)

# This is our "error" function
def err_func(p0, X, Y, target_function):
    err = ((Y - target_function(X, p0))**2).sum()
    return err


# Try out different parameters, sometimes the random guess works
# sometimes it fails. The FFT solution should always work for this problem
inital_args = np.random.random(3)

X = np.linspace(0, 10, 1000)
Y = sineFit(X, inital_args)

# Use a random inital guess
inital_guess = np.random.random(3)

# Fit
sol = optimize.fmin(err_func, inital_guess, args=(X,Y,sineFit))

# Plot the fit
Y2 = sineFit(X, sol)
plt.figure(figsize=(15,10))
plt.subplot(211)
plt.title("Random Inital Guess: Final Parameters: %s"%sol)
plt.plot(X,Y)
plt.plot(X,Y2,'r',alpha=.5,lw=10)

# Use an improved "fft" guess for the frequency
# this will be the max in k-space
timestep = X[1]-X[0]
guess_k = np.argmax( np.fft.rfft(Y) )
guess_f = np.fft.fftfreq(X.size, timestep)[guess_k]
inital_guess[1] = guess_f 

# Guess the amplitiude by taking the max of the absolute values
inital_guess[0] = np.abs(Y).max()

sol = optimize.fmin(err_func, inital_guess, args=(X,Y,sineFit))
Y2 = sineFit(X, sol)

plt.subplot(212)
plt.title("FFT Guess          : Final Parameters: %s"%sol)
plt.plot(X,Y)
plt.plot(X,Y2,'r',alpha=.5,lw=10)
plt.show()

OTHER TIPS

The problem is due to a bad initial guess of the phase, not the frequency. While cycling through the rows of genSine (inner loop) you use the fit result of the previous line as initial guess for the next row which does not work always. If you determine the phase from an fft of the current row and use that as initial guess the fit will succeed. You could change the inner loop as follows:

for n,rr in enumerate(sineGen):   
    fftx = np.fft.fft(rr)
    fftx = fftx[:len(fftx)/2]
    idx = np.argmax(np.abs(fftx))
    init_phase = np.angle(fftx[idx])
    print fftx[idx], init_phase
    ...

Also you need to change

def sineFit(t, a, f, p):
   return a * np.sin(2.0 * np.pi * f*t + p)

to

def sineFit(t, a, f, p):
   return a * np.cos(2.0 * np.pi * f*t + p)

since phase=0 means that the imaginary part of the fft is zero and thus the function is cosine like.

Btw. your sample above is still lacking definitions of sineGen and xDat.

Without understanding much of your code, according to http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html:

(amp2, freq2, phs2), pcov = optimize.curve_fit(sineFit, tDat, 
                                                     sub1, guess2)

should become:

(amp2, freq2, phs2), pcov = optimize.curve_fit(sineFit, tDat, 
                                                         sub1, p0=guess2)

Assuming that tDat and sub1 are x and y, that should do the trick. But, once again, it is quite difficult to understand such a complex code with so many interlinked variables and no comments at all. A code should always be build from bottom up, meaning that you don't do a loop of fits when a single one is not working, you don't add noise until the code works to fit the non-noisy examples... Good luck!

By "nothing fancy" I meant something like removing EVERYTHING that is not related with the fit, and doing a simplified mock example such as:

import numpy as np
import scipy.optimize as optimize

def sineFit(t, a, f, p):
       return a * np.sin(2.0 * np.pi * f*t + p)


# Create array of x and y with given parameters
x = np.asarray(range(100))
y = sineFit(x, 1, 0.05, 0)

# Give a guess and fit, printing result of the fitted values
guess = [1., 0.05, 0.]
print optimize.curve_fit(sineFit, x, y, guess)[0]

The result of this is exactly the answer:

[1.    0.05   0.]

But if you change guess not too much, just enough:

# Give a guess and fit, printing result of the fitted values
guess = [1., 0.06, 0.]
print optimize.curve_fit(sineFit, x, y, guess)[0]

the result gives absurdly wrong numbers:

[ 0.00823701  0.06391323 -1.20382787]

Can you explain this behavior?

You can use curve_fit with a series of trigonometric functions, usually very robust and ajustable to the precision that you need just by increasing the number of terms... here is an example:

from scipy import sin, cos, linspace
def f(x, a0,s1,s2,s3,s4,s5,s6,s7,s8,s9,s10,s11,s12,
            c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12):
    return a0 + s1*sin(1*x) +  c1*cos(1*x) \
              + s2*sin(2*x) +  c2*cos(2*x) \
              + s3*sin(3*x) +  c3*cos(3*x) \
              + s4*sin(4*x) +  c4*cos(4*x) \
              + s5*sin(5*x) +  c5*cos(5*x) \
              + s6*sin(6*x) +  c6*cos(6*x) \
              + s7*sin(7*x) +  c7*cos(7*x) \
              + s8*sin(8*x) +  c8*cos(8*x) \
              + s9*sin(9*x) +  c9*cos(9*x) \
             + s10*sin(9*x) + c10*cos(9*x) \
             + s11*sin(9*x) + c11*cos(9*x) \
             + s12*sin(9*x) + c12*cos(9*x)

from scipy.optimize import curve_fit
pi/2. / (x.max() - x.min())
x_norm *= norm_factor
popt, pcov = curve_fit(f, x_norm, y)
x_fit = linspace(x_norm.min(), x_norm.max(), 1000)
y_fit = f(x_fit, *popt)
plt.plot( x_fit/x_norm, y_fit )
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top