質問

I want the same results as in R summarizing multiple columns with data.table but for several summary functions.

Here is an example

data <- as.data.table(list(x1 = runif(200), x2 = 10*runif(200), group = factor(sample(letters[1:2]))))

res <- data[, rbindlist(lapply(.SD, function(x) {
              return(list(name = "varname", mean = mean(x), sd = sd(x)))
           }))
          , by = group, .SDcols = c("x1", "x2")
          ]

And get the following result:

   group    name      mean        sd
1:     b varname 0.5755798 0.2723767
2:     b varname 5.5108886 2.7649262
3:     a varname 0.4906111 0.3060961
4:     a varname 4.7780189 2.9740149

How can I get column names ('x1', 'x2') in second column? I guess that I need to substitute rbindlist to something else, but what? Is there any simple solution?

役に立ちましたか?

解決

An alternative way would be to construct your own function so that you can avoid this rbindlist wrap (which I find is unnecessary) which gives you the freedom of constructing your function the way you want:

tmp <- function(x) { 
    mm <- colMeans(x)
    ss=sapply(x, sd)
    list(names=names(x), mean=mm, sd=ss)
}

data[, tmp(.SD), by=group]
   group names      mean        sd
1:     a    x1 0.4988514 0.2770122
2:     b    x1 0.5246786 0.3014248
3:     a    x2 4.8031253 2.7978401
4:     b    x2 4.9104108 2.9135656

他のヒント

You can iterate your lapply on names(.SD) instead of .SD. Something like this :

data <- as.data.table(list(x1 = runif(200), x2 = 10*runif(200), group = factor(sample(letters[1:2]))))
res <- data[, rbindlist(lapply(names(.SD), function(name) {
              return(list(name = name, mean = mean(.SD[[name]]), sd = sd(.SD[[name]])))
           }))
          , by = group, .SDcols = c("x1", "x2")]

Which gives :

   group name      mean        sd
1:     b   x1 0.5344272 0.2697610
2:     b   x2 4.7628178 2.8313825
3:     a   x1 0.5008916 0.2686017
4:     a   x2 4.6175027 2.8942875
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top