문제

I have a data frame containing observations for various individuals.

The first column contains the name of the individual, and the following columns contain the observed states, whereas each column represents one month.

During the observation period, individuals are born, resulting in NA observations before their birth, and they leave the population for a reason displayed in the last observation, resulting in NAs following the last observation. I would like to change the NAs before the first observation to a certain value, and change the NAs following the leaving of the population, to the last observation.

Since the data frame comprises more than 30,000 rows and about 400 columns, I am looking for an efficient way, other than a basic ifelse() approach.

도움이 되었습니까?

해결책

na.locf() in the zoo package replaces NAs by carrying the last non-NA value forward. (Not only for trailing NAs, but also NAs in the middle of a vector - I assume you don't have those.) By default, it omits leading NAs. You can replace those by a specified value like this:

> library(zoo)
> xx <- c(NA, NA, 1, NA, 2, 3, NA, NA)
> replacement.for.initial.NAs <- -1
> foo <- min(which(!is.na(xx)))
> c(rep(replacement.for.initial.NAs,foo-1),na.locf(xx))
[1] -1 -1  1  1  2  3  3  3

You can loop this over your individuals. There is probably a smarter way involving apply() and friends to do this process per row or column of your data structure.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top