Question

Below is the sample data (out of approximately 8000 rows of data). How can I replace all NAs with values from a smoothing spline fit to the rest of the data?

Date            Max Min Rain    RHM RHE
4/24/1981   35.9    24.7    0.0 71  37
4/25/1981   36.8    22.8    0.0 62  40
4/26/1981   36.0    22.6    0.0 47  37
4/27/1981   35.1    24.2    0.0 51  39
4/28/1981   35.4    23.8    0.0 61  47
4/29/1981   35.4    25.1    0.0 67  43
4/30/1981   37.4    24.8    0.0 72  34
5/1/1981      NA      NA     NA NA  NA
5/2/1981    39.0    25.3     NA NA  55
5/3/1981    35.9    23.0    0.0 68  66
5/4/1981    28.4    22.4    0.7 70  30
5/5/1981    35.5    24.6    0.0 47  31
5/6/1981    37.4    25.5    0.0 51  31
Was it helpful?

Solution 2

I'm using some simplified data for the purposes of answering this query. Take this dataset:

dat <- structure(list(x = c(1.6, 1.6, 4.4, 4.5, 6.1, 6.7, 7.3, 8, 9.5, 
9.5, 10.7), y = c(2.2, 4.5, 1.6, 4.3, NA, NA, 4.8, 7.3, 8.7, 6.3, 12.3)),
.Names = c("x", "y"), row.names = c(NA, -11L), class = "data.frame")

Which looks like the below when plotted using plot(dat,type="o",pch=19):

enter image description here

Now fit a smoothing spline to the data without the NA values

smoo <- with(dat[!is.na(dat$y),],smooth.spline(x,y))

And then predict the y values for x, where y is currently NA

result <- with(dat,predict(smoo,x[is.na(y)]))
points(result,pch=19,col="red")

enter image description here

To fill the values back into the original data you can then do:

dat[is.na(dat$y),] <- result

OTHER TIPS

One thing to check out might be the na.spline function in the zoo package. It appears custom designed for this purpose.

Missing values (NAs) are replaced by linear interpolation via approx or cubic spline interpolation via spline, respectively.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top