Вопрос

I have a dataframe with timepoints nested in persons (unequal rows per person and missings). For each individual I want to add a new time point with NA's on all variables.

Here is an example of my data:

data_long      <- data.frame(id = factor(rep(1:3,each=4)), DV1 = c(1, 2, NA, 2), DV2 = c(2, 1, 2, 1), time = c(1989, 1995, 2003, 2010))
data_long$DV1       <- c(rnorm(12,0,1))
data_long$DV2       <- c(rnorm(12,0,1))
data_long$DV1[4]    <- NA
data_long$DV2[8]    <- NA
data_long[5,2:3]    <- NA
data_long[12,2:3]   <- NA
data_long       <- data_long[-9,]

T0 <- 1980 # new time point

This is what I want:

for (i in min(as.numeric(data_long$id)):max(as.numeric(data_long$id))){temp <- rbind(c(data_long[data_long$id == i,]$id[1], rep(NA,ncol(data_long[data_long$id == i,])-2), T0), data_long[data_long$id == i,])
write.table(temp, "test.dat", sep="\t", append=T, row.names=F, col.names=FALSE)}

data_long2 <- read.table("test.dat")

However, there must be a simpler way without actually saving the data in order to append differing numbers of rows. I apologize for this simple question and would be happy to be enlightened.

Это было полезно?

Решение

One more approach

newrows <- data.frame(id=unique(data_long$id), DV1=NA, DV2=NA, time=T0)
res <- merge(newrows, data_long, all.x=T, all.y=T)
res <- res[with(res, order(id, time)), ]

The result is:

> res
   id        DV1         DV2 time
5   1         NA          NA 1980
2   1 -0.6264538 -0.62124058 1989
3   1  0.1836433 -2.21469989 1995
1   1 -0.8356286  1.12493092 2003
4   1         NA -0.04493361 2010
9   2         NA          NA 1980
10  2         NA          NA 1989
6   2 -0.8204684  0.94383621 1995
7   2  0.4874291  0.82122120 2003
8   2  0.7383247          NA 2010
13  3         NA          NA 1980
11  3 -0.3053884  0.78213630 1995
12  3  1.5117812  0.07456498 2003
14  3         NA          NA 2010

Hope it helps,

alex

Другие советы

This doesn't exactly match what you share as your desired output, but it does seem to better match what you describe:

Use expand.grid to create a data.frame to merge with your original data.frame. The "id" will be just the existing unique "id" values in your source data.frame, and the "time" value will have the new value appended to it.

## set.seed(1) was used for this
X <- expand.grid(id = unique(data_long$id), 
                 time = c(1980, unique(data_long$time)))
merge(data_long, X, all.y = TRUE)
#    id time        DV1         DV2
# 1   1 1980         NA          NA
# 2   1 1989 -0.6264538 -0.62124058
# 3   1 1995  0.1836433 -2.21469989
# 4   1 2003 -0.8356286  1.12493092
# 5   1 2010         NA -0.04493361
# 6   2 1980         NA          NA
# 7   2 1989         NA          NA
# 8   2 1995 -0.8204684  0.94383621
# 9   2 2003  0.4874291  0.82122120
# 10  2 2010  0.7383247          NA
# 11  3 1980         NA          NA
# 12  3 1989         NA          NA  <---- This row is not there in your approach
# 13  3 1995 -0.3053884  0.78213630
# 14  3 2003  1.5117812  0.07456498
# 15  3 2010         NA          NA
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top