سؤال

boot() is failing with one dataset and succeeding with another ... must be a data issue? I just can't figure out the difference. But at least now I think I've got it reproducible. In both cases, an interaction between an integer and factor variable is regressed (lm) onto the numeric dependent variable. boot() command is failing with the error:

Error in boot(data = data, statistic = bs_p, R = 1000) : 
  number of items to replace is not a multiple of replacement length

My statistic function to return p-values is:

    bs_p <- function (data, i) {
      d <- data[i,]
      fit <- lm (y~x*fac, data=d)
      return(summary(fit)$coefficients[,4])
    }

When I generate random data in order to reproduce and post the question here, like this:

    L3 <- LETTERS[1:3]
    data <- data.frame(x=1:50, y=rnorm(1:50), fac=as.factor(sample(L3, 50, replace = TRUE)))

and then bootstrap:

    results <- boot(data=data, statistic=bs_p, R=1000)

bootstrapping works; no error; statistics generated. But with my own data (below), of the same types, boot() returns the error.

    y <- c(17.820, 13.764, 18.880, 25.830, 26.576, 29.832, 22.610, 24.180, 26.572, 26.030, 29.200, 28.560, 28.600, 16.614, 16.302, 18.080, 22.704, 28.101, 38.280, 17.100, 19.292, 33.165, 18.395, 19.434, 27.544, 17.010, 21.560, 28.120, 17.513, 21.646,24.060, 27.984, 20.830, 21.588, 26.280, 29.640, 17.313, 16.344, 16.362, 34.496, 22.785, 20.203, 29.040, 19.092, 20.890,20.739, 17.700, 17.424, 28.737, 18.318, 39.470, 28.072, 17.176, 28.098)
    x <- as.integer(c(9,  5,  0,  8,  3,  4,  9,  6,  9,  2, 15, 10,  5,  1, 11, 11,  4, 8, 13,  1,  2,  4,  7,  7, 12,  1,  6,  6,  4,  3,  5,  5,  7,  9,  8, 3, 3, 14,  6,  4,  3,  6, 17,  3,  6,  6,  7,  1,  6, 10 , 2, 14 , 5,  8))
    fac <- as.factor(c("F", "F", "F", "F", "F", "Ds", "F", "Ds","F","F","F","E", "Ds","F", "F", "E", "Ds","F", "Ds", "F", "Ds","E", "F", "E", "F", "Ds", "E", "Ds","F", "F", "F",  "Ds","Ds", "F", "Ds","F", "F", "E", "F","F","F", "F", "F", "Ds","F", "F", "F", "F", "Ds", "E", "F", "F", "F", "E"))
    data <- data.frame(x=x, y=y, fac=fac)

The linear model runs fine with these data on its own. traceback() doesn't yield anything but the boot call. Please, any thoughts most welcome. I'm on R 3.0.1 on MAC OSX. And thank you!

هل كانت مفيدة؟

المحلول

Some (or at least one) bootstrap resamples don't contain all factor levels, resulting in a smaller number of coefficients (and corresponding p-values), which results in the error when the bootstrap results are combined. I guess you need stratified bootstrap or bootstrap of the residuals (assuming that bootstrapping p-values is sensible, which I doubt).

نصائح أخرى

I had a similar error and I solved with this hand-made code, I hope it helps to someone.

bs_p <- function (data, i) {
  d <- data[i,]
  fit <- lm (y~x*fac, data=d)

  cf <- coef(fit)

  # identify differing coefficients and create dummy ones
  df <- setdiff(colnames(d), names(cf))
  ad <- rep(0, length(df))
  names(ad) <- df

  return(c(cf, ad))
}
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top