Here are two artificial but I hope pedagogical examples of my problem.
1) When running this code:
> dat0 <- data.frame(A=c("a","a","b"), B="")
> data.table(dat0)[, lapply(.SD, function(x) length(A)) , by = "A"]
A B
1: a 1
2: b 1
I expected the output
A B
1: a 2
2: b 1
(similarly to plyr::ddply(dat0, .(A), nrow)
).
Update to question 1)
Let me give a less artificial example. Consider the following dataframe:
dat0 <- data.frame(A=c("a","a","b"), x=c(1,2,3), y=c(9,8,7))
> dat0
A x y
1 a 1 9
2 a 2 8
3 b 3 7
Using plyr
package, I get the means of x
and y
by each value of A
as follows:
> ddply(dat0, .(A), summarise, x=mean(x), y=mean(y))
A x y
1 a 1.5 8.5
2 b 3.0 7.0
Very nice. Now imagine another variable H
and the following calculations:
dat0 <- data.frame(A=c("a","a","b"), H=c(0,1,-1), x=c(1,2,3), y=c(9,8,7))
> ddply(dat0, .(A), summarise, x=mean(x)^mean(H), y=mean(y)^mean(H))
A x y
1 a 1.2247449 2.9154759
2 b 0.3333333 0.1428571
Very nice too. But now, imagine there's a huge number of variables x
for which you want to calculate mean(x)^mean(H)
. Then I don't want to type:
ddply(dat0, .(A), summarise, a=mean(a)^mean(H), b=mean(b)^mean(H), c=mean(c)^mean(H), d=mean(d)^mean(H), ...........)
So my idea was to try:
flipcols <- my_selected_columns # c("a", "b", "c", "d", ....)
data.table(dat0)[, lapply(.SD, function(x) mean(x)^mean(H)), by = "A", .SDcols = flipcols]
But that doesn't work because the presence of H
in function(x) mean(x)^mean(H)
is not handled as I expected! I have not been able to make it work with plyr::colwise
too.
2) When running this code:
> dat0 <- data.frame(A=c("a","a","b"), B=1:3, c=0)
> data.table(dat0)[, lapply(.SD, function(x) B), .SDcols="c"]
Error in ..FUN(c) : object 'B' not found
I expected it works and generates :
c
1: 1
2: 2
3: 3
So is there a way to use the columns of the original data.table in a transformation ?