Question

novice here. I am fitting a negative binomial model on count data where Y is the count of events, D is the treatment, and X is a logarithmic offset:

out <- glm.nb(y ~ d + offset(log(x)),data=d1)

I would like to bootstrap the confidence intervals of the first difference between D=1 and D=0. I've gotten this far, but not sure if it is the correct approach:

holder <- matrix(NA,1200,1)
out <- out <- glm.nb(y ~ d + offset(log(x)),data=d1)

for (i in 1:1200){
q <- sample(1:nrow(d1), 1)
d2 <- d1[q,]
d1_1 <- d1_2 <- d2
d1_1$d <- 1
d1_2$d <- 0
d1pred <- predict(out,d1_1,type="response")
d2pred <- predict(out,d1_2,type="response")
holder[i,1] <- (d1pred[1] - d2pred[1])
}
mean(holder)

Is this the correct way to bootstrap the first difference?

Was it helpful?

Solution

Generally, your approach is ok, but you can do it in more R-ish way. Firstly, if you are serious about bootstrapping you can employ boot library and benefit from more compact code, no loops and many other advanced options.

In your case it can look like:

## Data generation
N <- 100
set.seed(1)
d1 <- data.frame(y=rbinom(N, N, 0.5),
                 d=rbinom(N, 1, 0.5),
                 x=rnorm(N, 10, 3))
## Model
out <- glm.nb(y ~ d + offset(log(x)), data=d1)

## Statistic function (what we are bootstrapping)
## Returns difference between D=1 and D=0
diff <- function(x,i,model){
  v1 <- v2 <- x[i,]
  v1$d <- 1
  v2$d <- 0
  predict(model,v1,type="response") - predict(model,v2,type="response")
}

## Bootstrapping itself
b <- boot(d1, diff, R=5e3, model=out)
mean(b$t)

Now b$t holds bootstrapped values. See names(b) and/or ?boot for extra information.

Bootstrapping is time consuming operation, and one of the obvious advantage of boot library is support for parallel operations. It's as easy as:

 b <- boot(d1, diff, R=5e3, model=out, parallel="multicore", ncpus=2)

If you are on Windows use parallel="snow" instead.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top