Question

I am using approx() to interpolate values.

x <- 1:20
y <- c(3,8,2,6,8,2,4,7,9,9,1,3,1,9,6,2,8,7,6,2)
df <- cbind.data.frame(x,y)

> df
    x y
1   1 3
2   2 8
3   3 2
4   4 6
5   5 8
6   6 2
7   7 4
8   8 7
9   9 9
10 10 9
11 11 1
12 12 3
13 13 1
14 14 9
15 15 6
16 16 2
17 17 8
18 18 7
19 19 6
20 20 2

interpolated <- approx(x=df$x, y=df$y, method="linear", n=5)

gets me this:

interpolated
$x
[1]  1.00  5.75 10.50 15.25 20.00

$y
[1] 3.0 3.5 5.0 5.0 2.0

Now, the first and last value are duplicates of my real data, is there any way to prevent this or is it something I don't understand properly about approx()?

Was it helpful?

Solution

You may want to specify xout to avoid this. For instance, if you want to always exclude the first and the last points, here's how you can do that:

specify_xout <- function(x, n) {
  seq(from=min(x), to=max(x), length.out=n+2)[-c(1, n+2)]
}

plot(df$x, df$y)
points(approx(df$x, df$y, xout=specify_xout(df$x, 5)), pch = "*", col = "red")

It does not prevent from interpolating the existing point somewhere in the middle (exactly what happens on the picture below). enter image description here

OTHER TIPS

approx will fit through all your original datapoints if you give it a chance (change n=5 to xout=df$x to see this). Interpolation is the process of generating values for y given unobserved values of x, but should agree if the values of x have been previously observed.

The method="linear" setup is going to 'draw' linear segments joining up your original coordinates exactly (and so will give the y values you input to it for integer x). You only observe 'new' y values because your n=5 means that for points other than the beginning and end the x is not an integer (and therefore not one of your input values), and so gets interpolated.

If you want observed values not to be exactly reproduced, then maybe add some noise via rnorm ?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top