Frage

I want the same results as in R summarizing multiple columns with data.table but for several summary functions.

Here is an example

data <- as.data.table(list(x1 = runif(200), x2 = 10*runif(200), group = factor(sample(letters[1:2]))))

res <- data[, rbindlist(lapply(.SD, function(x) {
              return(list(name = "varname", mean = mean(x), sd = sd(x)))
           }))
          , by = group, .SDcols = c("x1", "x2")
          ]

And get the following result:

   group    name      mean        sd
1:     b varname 0.5755798 0.2723767
2:     b varname 5.5108886 2.7649262
3:     a varname 0.4906111 0.3060961
4:     a varname 4.7780189 2.9740149

How can I get column names ('x1', 'x2') in second column? I guess that I need to substitute rbindlist to something else, but what? Is there any simple solution?

War es hilfreich?

Lösung

An alternative way would be to construct your own function so that you can avoid this rbindlist wrap (which I find is unnecessary) which gives you the freedom of constructing your function the way you want:

tmp <- function(x) { 
    mm <- colMeans(x)
    ss=sapply(x, sd)
    list(names=names(x), mean=mm, sd=ss)
}

data[, tmp(.SD), by=group]
   group names      mean        sd
1:     a    x1 0.4988514 0.2770122
2:     b    x1 0.5246786 0.3014248
3:     a    x2 4.8031253 2.7978401
4:     b    x2 4.9104108 2.9135656

Andere Tipps

You can iterate your lapply on names(.SD) instead of .SD. Something like this :

data <- as.data.table(list(x1 = runif(200), x2 = 10*runif(200), group = factor(sample(letters[1:2]))))
res <- data[, rbindlist(lapply(names(.SD), function(name) {
              return(list(name = name, mean = mean(.SD[[name]]), sd = sd(.SD[[name]])))
           }))
          , by = group, .SDcols = c("x1", "x2")]

Which gives :

   group name      mean        sd
1:     b   x1 0.5344272 0.2697610
2:     b   x2 4.7628178 2.8313825
3:     a   x1 0.5008916 0.2686017
4:     a   x2 4.6175027 2.8942875
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top