Question

I have a large data set which consists of a columns of IDs followed by a monthly time series for each ID. There are frequent missing values in this set, but what I would like to do is replace all NAs after the first non-zero with a zero while leaving all the NAs before the first non-zero value as NA's.

eg.

[NA NA NA 1 2 3 NA 4 5 NA] would be changed to [NA NA NA 1 2 3 0 4 5 0]

Any help or advice you guys could offer would be much appreciated!

Was it helpful?

Solution

Easy to do using match() and numeric indices:

  • use match() to find the first occurence of a non-NA value
  • use which() to convert the logical vector from is.na() to a numeric index
  • use that information to find the correct positions in x

Hence:

x <- c(NA,NA,NA,1,2,3,NA,NA,4,5,NA)
isna <- is.na(x)
nonna <- match(FALSE,isna)
id <- which(isna)
x[id[id>nonna]] <- 0

gives:

> x
 [1] NA NA NA  1  2  3  0  0  4  5  0

OTHER TIPS

Here's another method. Convert all to zeros first, then covert the first zeros back to NA.

> x <- c(NA,NA,NA,1,2,3,NA,NA,4,5,NA)
> x[which(is.na(x))] <- 0
### index from 1 to first element before the first element >0
> x[1:min(which(x>0))-1] <- NA
> x
 [1] NA NA NA  1  2  3  0  0  4  5  0

also

### end of vector (elements are >0)
> endOfVec <- min(which(x>0)):length(x)
> x[endOfVec][is.na(x[endOfVec])] <- 0
[1] NA NA NA  1  2  3  0  0  4  5  0
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top