Question
I have a data structure that looks like the following:
groupA1 groupA2 groupB1 groupB2 date text
0 1 1 1 2013-01-01 the dog
For each groupB
variable, I want to list one row for each column that has a value of 1.
I need to list all combinations of groupA
and groupB
where 1s are present into one row, but then also add the date and text to each of those combinations as columns in that row.
Transformed data would appear as:
var_groupB var_groupA date text
groupB1 groupA2 2013-01-01 the dog
groupB2 groupA2 2013-01-01 the dog
I've tried combinations of melt
and ddply
but am always left without one of the variables I need.
One thing I tried was melt(x, id.vars=c("text", "date"))
but then I lose all information about the relationships between groupA
and groupB
.
I could accomplish this using a messy loop, but wasn't sure if a reshape
utility exists that I'm unaware of and could do the job.
Solution
You could melt
twice, once for each group:
y <- melt(x, measure.vars=c("groupA1", "groupA2"),
variable.name="var_groupA", value.name="val_groupA")
y <- melt(y, measure.vars=c("groupB1", "groupB2"),
variable.name="var_groupB", value.name="val_groupB")
That would give you one row for each combination of A and B:
date text var_groupA val_groupA var_groupB val_groupB
1 2013-01-01 the dog groupA1 0 groupB1 1
2 2013-01-01 the dog groupA2 1 groupB1 1
3 2013-01-01 the dog groupA1 0 groupB2 1
4 2013-01-01 the dog groupA2 1 groupB2 1
Then you could subset this and remove the value columns:
y <- y[y$val_groupA == 1 & y$val_groupB==1, ]
y <- y[, c("var_groupA", "var_groupB", "date", "text")]
Which gives you what you want:
var_groupA var_groupB date text
2 groupA2 groupB1 2013-01-01 the dog
4 groupA2 groupB2 2013-01-01 the dog
Of course, if your dataset is more complex than in your example, you can make this solution more elegant by doing the melting and subsetting in a more automated fashion--e.g., detecting the group columns and filling in measure.vars
, variable.name
, and value.name
automatically, perhaps for any number of groups.
OTHER TIPS
The first 2 statements replace each 0 in the first 4 columns with "" and each 1 with the column name giving dd2
. The next two statements generate all combinations of groupA
and groupB
for each row using expand.grid
with the result being dd3
. Finally subset
that to those rows with no ""
entries:
newvals <- function(nm) ifelse(dd[[nm]] == 0, "", nm)
dd2 <- replace(dd, 1:4, lapply(names(dd)[1:4], newvals))
combo <- function(x) data.frame(expand.grid(groupA=c(x[1:2]), groupB=c(x[3:4])),
x$date, x$text)
dd3 <- do.call("rbind", by(dd2, 1:nrow(dd2), combo))
subset(dd3, groupA != "" & groupB != "")
This gives:
groupA groupB x.date x.text
1.2 groupA2 groupB1 2013-01-01 the dog
1.4 groupA2 groupB2 2013-01-01 the dog