문제

I have a data frame (df) with missing values and want to impute interpolated values with restriction. My data frame is:

X<-c(100,NA,NA,70,NA,NA,NA,NA,NA,NA,35)
Y<-c(10,NA,NA,40,NA,NA,NA,NA,NA,NA,5)
Z<-c(50,NA,NA,20,NA,NA,NA,NA,NA,NA,90)
df<-as.data.frame(cbind(X,Y,Z))
df
     X  Y  Z
1  100 10 50
2   NA NA NA
3   NA NA NA
4   70 40 20
5   NA NA NA
6   NA NA NA
7   NA NA NA
8   NA NA NA
9   NA NA NA
10  NA NA NA
11  35  5 90

I was able to impute missing values from linear interpolation of the known values using:

 data.frame(lapply(df, function(X) approxfun(seq_along(X), X)(seq_along(X))))
     X  Y  Z
1  100 10 50
2   90 20 40
3   80 30 30
4   70 40 20
5   65 35 30
6   60 30 40
7   55 25 50
8   50 20 60
9   45 15 70
10  40 10 80
11  35  5 90

My question is how can I put constraint to the interpolation? Say NAs more than 5 consecutive entries should remain as NAs and not be imputed by linear interpolation so that my new data frame would look like:

    X  Y  Z
1  100 10 50
2   90 20 40
3   80 30 30
4   70 40 20
5   NA NA NA
6   NA NA NA
7   NA NA NA
8   NA NA NA
9   NA NA NA
10  NA NA NA
11  35  5 90
도움이 되었습니까?

해결책

Here's something that works. It uses na.rm to identify NAs, rle to identify runs of NAs, and then cumsum to translate those runs into positions in the vector.

data.frame(lapply(df, function(X) {
    af = approxfun(seq_along(X), X)
    rl = rle(is.na(X))
    cu = cumsum(rl$length)
    L=5
    unlist(sapply(1:length(cu), function(x) {
        if (rl$values[x] & rl$length[x]>L) rep(NA, rl$lengths[x])
        else af(seq(cu[x]-rl$lengths[x]+1,cu[x]))
    }))
}))
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top