Question

I've got myself in a little jam, and there is probably a better way to describe what I want to do (will edit if needed).

What I have is a data frame representing some observations, x. I would like to create a different dataframe, y, where I have all distinct combinations of some variables from x and where one of columns is a list of lists composed of other variables from x.

I've simplified this into an example, here is x:

x <- data.frame( c(1,1,1,1,1,1,1,2,2,2), c(11:12,11:12,11:12,11:12,16,17), c(101:110))
names(x) <- c("a","b","c")

   a  b   c
1  1 11 101
2  1 12 102
3  1 11 103
4  1 12 104
5  1 11 105
6  1 12 106
7  1 11 107
8  2 12 108
9  2 16 109
10 2 17 110

And here is y (distinct combos of a,b in x):

y <- unique(data.frame(x$a,x$b))
names(y) <- c("a","b")
row.names(y) <- NULL

  a  b
1 1 11
2 1 12
3 2 12
4 2 16
5 2 17

What I want to do is to transform y into this:

  a  b                  c
1 1 11 101, 103, 105, 107
2 1 12      102, 104, 106
3 2 12                108
4 2 16                109 
5 2 17                110

Where "c" in each row contains values of c from x collected into a list.

I'd like to find a nice succinct and idiomatic way of doing this, but will settle for anything that does the job.

Was it helpful?

Solution

This is going to be pretty and cryptic looking:

aggregate(c ~ a + b, x, I)
#   a  b                  c
# 1 1 11 101, 103, 105, 107
# 2 1 12      102, 104, 106
# 3 2 12                108
# 4 2 16                109
# 5 2 17                110

The I function (you can also use c) would create a list in your third column. You don't need to create a separate data.frame for the unique combinations of "a" and "b". Just use them as the grouping variables in aggregate.


Of course, there are many other ways to do this.

Here's data.table:

library(data.table)
X <- as.data.table(x)
X[, list(c = list(I(c))), by = list(a, b)]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top