Question

I have the following sample data.table:

dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))

I would like to aggregate all columns (a and b, though they should be kept separate) by id using colSums, for example. What is the correct way to do this? The following does not work:

 dtb[,colSums, by="id"]

This is just a sample and my table has many columns so I want to avoid specifying all of them in the function name

Was it helpful?

Solution

this is actually what i was looking for and is mentioned in the FAQ:

dtb[,lapply(.SD,mean),by="id"]

OTHER TIPS

I guess in this case is it fastest to bring your data first into the long format and do your aggregation next (see Matthew's comments in this SO post):

library(data.table)
dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))
library(reshape2)
dt_long <- as.data.table(melt(dtb, id.var="id"))
dt_long[, sum(value), by=c("id","variable")]
    id variable  V1
 1:  1        a 601
 2:  2        a 440
 3:  3        a 496
 4:  4        a 553
 5:  5        a 444
 6:  6        a 466
 7:  7        a 525
 8:  8        a 553
 9:  9        a 541
...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top