I have a data frame (df) with missing values and want to impute interpolated values with restriction. My data frame is:
X<-c(100,NA,NA,70,NA,NA,NA,NA,NA,NA,35)
Y<-c(10,NA,NA,40,NA,NA,NA,NA,NA,NA,5)
Z<-c(50,NA,NA,20,NA,NA,NA,NA,NA,NA,90)
df<-as.data.frame(cbind(X,Y,Z))
df
X Y Z
1 100 10 50
2 NA NA NA
3 NA NA NA
4 70 40 20
5 NA NA NA
6 NA NA NA
7 NA NA NA
8 NA NA NA
9 NA NA NA
10 NA NA NA
11 35 5 90
I was able to impute missing values from linear interpolation of the known values using:
data.frame(lapply(df, function(X) approxfun(seq_along(X), X)(seq_along(X))))
X Y Z
1 100 10 50
2 90 20 40
3 80 30 30
4 70 40 20
5 65 35 30
6 60 30 40
7 55 25 50
8 50 20 60
9 45 15 70
10 40 10 80
11 35 5 90
My question is how can I put constraint to the interpolation? Say NAs more than 5 consecutive entries should remain as NAs and not be imputed by linear interpolation so that my new data frame would look like:
X Y Z
1 100 10 50
2 90 20 40
3 80 30 30
4 70 40 20
5 NA NA NA
6 NA NA NA
7 NA NA NA
8 NA NA NA
9 NA NA NA
10 NA NA NA
11 35 5 90