Question

I'm sure this is a simple question, but I'm not sure what to search for or how to solve it.

My data looks something like this:

Date     UserID   Var1   Var2
2013-01  1        2      3
2013-02  1        2      1
etc.

Where the dates that are recorded are different for each user.

I want to find statistics about one of the variables (let's say Var2), and in order to do that I want to reshape the data, so that the output will be something like:

ID   Time1   Time2   Time3   ...
1    3       1       NA
etc.

where Time1 is the first date that a user appears in the data, Time2 is the next, etc.

Was it helpful?

Solution

(edited to handle the re-alignment of dates on a relative scale)

What about using reshape?

This is to input the data:

> zz <- "Date     UserID   Var1   Var2
+ 2013-01  1        2      3
+ 2013-02  1        2      1
+ 2013-02  2        2      1"
> Data <- read.table(text=zz, header = TRUE)

This is to reshape:

Data2 = reshape(Data[c(1,2,4)],direction='wide',timevar='Date',idvar='UserID')

Resulting in:

> Data2
  UserID Var2.2013-01 Var2.2013-02
1      1            3            1
3      2           NA            1

(note that c(1,2,4) refers to the colum numbers corresponding to Date, UserID and Var2)

Finally, to obtain the desired output, you can simply move the NA to the end of each line:

t(sapply(1:nrow(Data2), function(x) c(Data2[x,!is.na(Data2[x,])],rep(NA,sum(is.na(Data2[x,]))))))

Resulting in:

     UserID Var2.2013-01 Var2.2013-02
[1,] 1      3            1           
[2,] 2      1            NA    

Note that the column names here are not current anymore. Also, you may possibly have columns containing only NA on the right of your matrix... I will let you handle these small details.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top