Question

I have a data structure that looks like the following:

 groupA1    groupA2    groupB1    groupB2    date        text
     0         1          1          1      2013-01-01   the dog

For each groupB variable, I want to list one row for each column that has a value of 1. I need to list all combinations of groupA and groupB where 1s are present into one row, but then also add the date and text to each of those combinations as columns in that row.

Transformed data would appear as:

var_groupB  var_groupA  date         text
 groupB1    groupA2     2013-01-01    the dog
 groupB2    groupA2     2013-01-01    the dog

I've tried combinations of melt and ddply but am always left without one of the variables I need.

One thing I tried was melt(x, id.vars=c("text", "date")) but then I lose all information about the relationships between groupA and groupB.

I could accomplish this using a messy loop, but wasn't sure if a reshape utility exists that I'm unaware of and could do the job.

Was it helpful?

Solution

You could melt twice, once for each group:

y <- melt(x, measure.vars=c("groupA1", "groupA2"),
          variable.name="var_groupA", value.name="val_groupA")
y <- melt(y, measure.vars=c("groupB1", "groupB2"),
          variable.name="var_groupB", value.name="val_groupB")

That would give you one row for each combination of A and B:

        date    text var_groupA val_groupA var_groupB val_groupB
1 2013-01-01 the dog    groupA1          0    groupB1          1
2 2013-01-01 the dog    groupA2          1    groupB1          1
3 2013-01-01 the dog    groupA1          0    groupB2          1
4 2013-01-01 the dog    groupA2          1    groupB2          1

Then you could subset this and remove the value columns:

y <- y[y$val_groupA == 1 & y$val_groupB==1, ]
y <- y[, c("var_groupA", "var_groupB", "date", "text")]

Which gives you what you want:

  var_groupA var_groupB       date    text
2    groupA2    groupB1 2013-01-01 the dog
4    groupA2    groupB2 2013-01-01 the dog

Of course, if your dataset is more complex than in your example, you can make this solution more elegant by doing the melting and subsetting in a more automated fashion--e.g., detecting the group columns and filling in measure.vars, variable.name, and value.name automatically, perhaps for any number of groups.

OTHER TIPS

The first 2 statements replace each 0 in the first 4 columns with "" and each 1 with the column name giving dd2. The next two statements generate all combinations of groupA and groupB for each row using expand.grid with the result being dd3. Finally subset that to those rows with no "" entries:

newvals <- function(nm) ifelse(dd[[nm]] == 0, "", nm)
dd2 <- replace(dd, 1:4, lapply(names(dd)[1:4], newvals))

combo <- function(x) data.frame(expand.grid(groupA=c(x[1:2]), groupB=c(x[3:4])), 
             x$date, x$text)
dd3 <- do.call("rbind", by(dd2, 1:nrow(dd2), combo)) 

subset(dd3, groupA != "" & groupB != "")

This gives:

     groupA  groupB     x.date  x.text
1.2 groupA2 groupB1 2013-01-01 the dog
1.4 groupA2 groupB2 2013-01-01 the dog
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top