Question

I'm trying to transform a data frame from long to wide format using the dcast function.

Here is the starting data frame:

convID     var      value
aa         in       1
ab         in       1
aa         id       4/29/2014
ab         id       4/20/2014
aa         it       Impr
ab         it       Impr
aa         ic       Display
ab         ic       Display
ab         in       2
ab         in       2
aa         id       4/25/2014
ab         id       4/24/2014
aa         it       Impr
ab         it       Click
aa         ic       Display
ab         ic       SEM

The desired data frame I want is, where the top half of the id, it, and ic correspond to in=1 and bottom half of the id, it, and ic correspond to in=2:

convID     in     id           it       ic
aa         1      4/29/204     Impr     Display
ab         1      4/20/204     Impr     Display
aa         2      4/25/204     Impr     Display
aa         2      4/24/204     Click    SEM 

However I'm not able to get the desired data frame using the dcast function. I tried many times and the closest I got was the following:

dcast(df,convID~var, value.var="value", fun.aggregate=max)

convID     in     id          it       ic
aa         2      4/29/204    Impr     Display
aa         2      4/24/204    Impr     SEM 

This is obviously not right as it's returning max values of in, id, it, and ic and the proper assignments of in=1 and in=2 are disregarded. Additionally, I'm missing half my data. Any advise would be greatly appreciated!

#Here is code to produce the starting data frame:
convID<-c("aa", "ab", "aa", "ab", "aa", "ab", "aa", "ab", "aa", "ab", "aa", "ab", "aa", "ab", "aa", "ab")  
var<-c("in", "in", "id", "id", "it", "it", "ic", "ic","in", "in", "id", "id", "it", "it", "ic", "ic")
value<-c("1", "1", "4/29/14", "4/20/14", "Impr", "Impr", "Display", "Display", "2", "2", "4/25/14", "4/24/14", "Impr", "Click", "Display", "SEM")
df<-data.frame(convID, var, value)
df$value<-as.character(df$value) 
Was it helpful?

Solution

Your problem is that in is not already a variable in your data frame (I changed the name to inval because there are a few weirdnesses associated with trying to use a variable called in inside within).

I generated inval by using zoo::na.locf to set the value for each row to the last previously specified value:

library(zoo)
df <- within(df,{
    inval <- ifelse(var=="in",value,NA)
    inval <- na.locf(inval)
})

This results in:

str(df)
## 'data.frame':    16 obs. of  4 variables:
##  $ convID: Factor w/ 2 levels "aa","ab": 1 2 1 2 1 2 1 2 1 2 ...
##  $ var   : Factor w/ 4 levels "ic","id","in",..: 3 3 2 2 4 4 1 1 3 3 ...
##  $ value : chr  "1" "1" "4/29/14" "4/20/14" ...
##  $ inval : chr  "1" "1" "1" "1" ...

Then it's easy to dcast:

library(reshape2)
dcast(subset(df,var!="in"),convID+inval~...)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top