R bootstrap statistics by group for big data

Question 1

This should do it:

data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1

Question 2

Using

 boot       * 1.3-18  2016-02-23 CRAN (R 3.2.3)                        
 data.table * 1.9.7   2015-10-05 Github (Rdatatable/data.table@d607425)

I received an error using the OP's code with the answer supplied by @eddi:

data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5))
stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2)), by = "group"]}
data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1

Produces the error message:

Error in eval(expr, envir, enclos) : object 'group' not found

The error is fixed by removing by=group from the function stat:

set.seed(1000)
data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5))
stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2))]}
data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1

Which produces the following Bootstrap Statistics results:

[[1]]

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = .SD, statistic = stat, R = 10)


Bootstrap Statistics :
     original       bias    std. error
t1* 0.5158232  0.004930451  0.01576641
t2* 0.5240713 -0.001851889  0.02851483

[[2]]

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = .SD, statistic = stat, R = 10)


Bootstrap Statistics :
     original        bias    std. error
t1* 0.5142383 -0.0072475030  0.02568692
t2* 0.5291694 -0.0001509404  0.02378447

Below, I modify the sample dataset to highlight which Bootstrap Statistic goes with which group-column combination:

Consider group 1 which has a mean value of 10 for x1 and a mean value of 10000 for x2 and group 2 which has a mean value of 2000 for x1 and a mean value of 8000 for x2:

data2 <- as.data.table(list(x1 = c(runif(100, 9,11),runif(100, 1999,2001)), x2 = c(runif(100, 9999,10001),runif(100, 7999,8001)), group = rep(c(1,2), each=100)))
stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2))]}
data2[, list(list(boot(.SD, stat, R = 10))), by = group]$V1

Which gives:

[[1]]

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = .SD, statistic = stat, R = 10)


Bootstrap Statistics :
      original       bias    std. error
t1*   10.00907  0.007115938  0.04349184
t2* 9999.90176 -0.019569568  0.06160653

[[2]]

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = .SD, statistic = stat, R = 10)


Bootstrap Statistics :
    original       bias    std. error
t1* 1999.965  0.031694179  0.06561209
t2* 8000.110 -0.006569872  0.03992401

Question 3

Lots of problems in your code before you even get to the by group part.

Did you mean something like this?

data <- as.data.frame(list(x1 = runif(200), x2 = runif(200), group = factor(sample(letters[1:2]))))
stat <- function(x, i)  c(m1 = mean(x$x1[i]), m2 = mean(x$x2[i]))

> stat(x,1:10)
       m1        m2 
0.4465738 0.5522221

Then from there you can worry about doing it by group however you choose to.

For instance:

library(plyr)
dlply( data, .(group), function( dat ) boot(dat, stat, R=10) )

For bigger datasets, try data.table:

by( seq(nrow(data)), data$group, function(idx) myboot(data[idx,]))

I went with by() rather than the data.table's ,by= argument because you want the output to be a list. There may be some functionality I don't know about for doing that, but I couldn't find it (see the edit history for the problem it was causing).

The subsetting is still done via the data.table's [] method, so it should be plenty fast.