質問

When returning items to data.table, it would be nice if they automatically took on the names of the variables. How does one do this? This is what I mean:

require(data.table)
x = data.table(a=1:10, id=1:2)
x[,{s = sum(a); p=prod(a); y = sqrt(abs(s*p)); z = y+1; list(y, z)},by=id]

#   id V1   V2
#1:  1 25  945
#2:  2 30 3840

Instead of V1 and V2 it would be nice if the columsn were labeled s and p. It's no big thing to do this here but if you have 20 columns it becomes a real pain. Any ideas on how to do this?

EDIT: I changed the question to make clear why I don't just do list(name = value)

役に立ちましたか?

解決

Forgive me if I'm missing something... but isn't the standard list syntax for data.table what you're looking for? It is more concise and clearer IMHO.

x[, 
  list(s = sum(a),
       p = prod(a)),
  by=id] 

#    id  s    p
# 1:  1 25  945
# 2:  2 30 3840

You can also build up this list as an expression and eval it.

foo <- expression(list(s=sum(a), p=prod(a)))

x[, eval(foo), by=id]

This can then be extended to a function (using as.quoted from plyr instead cause its handy):

expression_maker <- function(funs, cols, names) {
   require(plyr)
   list_contents <- paste0(names, '=', funs, '(', cols, ')', collapse=',')
   as.quoted(paste('list(', list_contents, ')'))[[1]]
}

output <- expression_maker(funs=c('sum', 'prod'), cols=c('a', 'a'), names=c('s', 'p'))
x[, eval(output), by=id]

... But there be dragons!


per OP's edit:

x[,{s = sum(a); p=prod(a); y = sqrt(abs(s*p)); z = y+1; list(y, z)},by=id]

I would do this in a function and return a data.table directly:

yourfun <- function(a) {
  s <- sum(a)
  p <- prod(a)
  y <- sqrt(abs(s*p))
  z <- y+1
  data.table(y, z)
}

x[, yourfun(a), by=id]

他のヒント

A remix of the other two answers - name them in the list:

x[,{s = sum(a); p=prod(a); y = sqrt(abs(s*p)); z = y+1;
    list(s = y, p = z)}, by=id]

or construct a data.table

x[,{s = sum(a); p=prod(a); y = sqrt(abs(s*p)); z = y+1;
    data.table(y, z)}, by=id]

And here's another option using Hmisc (this is slower than naming manually, but probably faster than data.table):

library(Hmisc)
x[,{s = sum(a); p=prod(a); y = sqrt(abs(s*p)); z = y+1;
    llist(y, z)}, by=id]

if you have a large amount of vairables and you are looking for a programmatic way to approach this, you can put the names of the columns in a vector and then use sapply with .SDcols eg:

## sample data
set.seed(7)
DT <- as.data.table(matrix(round(runif(130, 1, 100)), ncol=26))
setnames(DT, LETTERS)


## These are the columns we will compute on
Cols <- c("A", "G", "M", "W", "Z")

DT[,sapply(.SD,mean),.SDcols=Cols]

#    A    G    M    W    Z 
# 25.0 41.2 55.6 43.0 56.0     

If you want to compute different functions on each variable, then use the standard list(nm=function(x))

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top