Frage

I have a dataset (see below) of two variables, x and y. I want to find for which value of x, does a maximum in y occur. My current approach is simply to look up the x which gives me the maximum y. This is not ideal as my data is quite noisy, so I would like to perform some sort of smoothing first, and then find the max.

So far, I have tried to use R to smooth my data with npreg (kernel regression) from the np package to obtain this curve:

enter image description here

but I'm not sure how to find the max.

I would like a solution to the following in Python:

1) Smooth the data (doesn't to be kernel regression)

2) Find the value of x where the max in y occurs using the smoothed data

x   y
-20 0.006561733
-19 -4.48E-08
-18 -4.48E-08
-17 -4.48E-08
-16 0.003281305
-15 0.00164063
-14 0.003280565
-13 0.003282537
-12 -4.48E-08
-11 0.003281286
-10 0.004921239
-9  0.00491897
-8  -1.52E-06
-7  0.004925867
-6  -1.27E-06
-5  0.009839438
-4  0.001643726
-3  -4.48E-08
-2  2.09E-06
-1  -0.001640027
0   0.006559627
1   0.001636958
2   2.36E-06
3   0.003281469
4   0.011481469
5   0.004922279
6   0.018044207
7   0.011483134
8   0.014765087
9   0.008201379
10  0.00492497
11  0.006560482
12  0.009844796
13  0.011483199
14  0.008202129
15  0.001641621
16  0.004921645
17  0.006563377
18  0.006561068
19  0.008201004
War es hilfreich?

Lösung

I'd run a Gaussian filter over the data to smooth:

# first, make a function to linearly interpolate the data
f = scipy.interpolate.interp1d(x,y)

# resample with 1000 samples
xx = np.linspace(-20,19, 1000)

# compute the function on this finer interval
yy = f(xx)

# make a gaussian window
window = scipy.signal.gaussian(200, 60)

# convolve the arrays
smoothed = scipy.signal.convolve(yy, window/window.sum(), mode='same')

# get the maximum
xx[np.argmax(smoothed)]

Here's the smoothed result:

The smoothed result.

The max occurs at 6.93.

There are a whole bunch of other window functions and filtering options in scipy.signal. See the documentation for more.

Andere Tipps

You might be able to use the smooth spline functions:

import numpy as np
from scipy import interpolate
x = range(-20,20)
y = [0.006561733, -4.48e-08, -4.48e-08, -4.48e-08, 0.003281305, 0.00164063, 0.003280565, 0.003282537, -4.48e-08, 0.003281286, 0.004921239, 0.00491897, -1.52e-06, 0.004925867, -1.27e-06, 0.009839438, 0.001643726, -4.48e-08, 2.09e-06, -0.001640027, 0.006559627, 0.001636958, 2.36e-06, 0.003281469, 0.011481469, 0.004922279, 0.018044207, 0.011483134, 0.014765087, 0.008201379, 0.00492497, 0.006560482, 0.009844796, 0.011483199, 0.008202129, 0.001641621, 0.004921645, 0.006563377, 0.006561068, 0.008201004]

tck = interpolate.splrep(x,y) # pass in s= some value to change smoothing: 
                              # higher = smoother, s=0 for no smoothing

xnew = np.arange(-20, 20, 0.1)
ynew = interpolate.splev(xnew,tck,der=0)

now xnew and ynew contain a finely sampled version of the fit, and you get the max with

max_index = np.argmax(ynew)
max_value = ynew[max_index]
max_x = xnew[max_index]

Sorry I was not able to test this; computer I am using right now doesn't have scipy etc. loaded... Should give you some ideas though.

Reference: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html

I am not completely sure what is the main problem to solve? Better smoothing, finding the minimum or doing it all in Python? Why are you changing to Python if you have promising progress in R? I have found that in R the built in supsmu function usually does very good non-parametric smoothing. This is how I would do this in R.

smooth <- do.call(supsmu, data)
min.idx <- which.min(smooth$y)
min.point <- c(smooth$x[min.idx], smooth$y[min.idx])
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top