Avoid error in R boot.ci function when all values in sampled set are equal

https://stackoverflow.com/questions/16652852

30-05-2022
|

質問

I have many data sets that are inputs to a function. The data is stored in a data table, and I'm calculating confidence intervals for my function output. However, there are some cases when all of the input data is the same, resulting in an error: "All values of x are equal to 100 \n Cannot calculate confidence intervals" How can I avoid this error (e.g., just set the confidence interval to an arbitrary value like 0 or NA for the case when all values are equal)? For example:

library(boot)
library(data.table)

problem=1

data<-data.table(column1=c(1:100),column2=c(rep(100,99),problem))
resample.number=1000
confidence=0.95

sample.mean<-function(indata,x){mean(indata[x])}

boot_obj<-lapply(data,boot,statistic = sample.mean,R = resample.number)

boot.mean.f<-function(x,column){
    x[column][1]
}

means<-data.table(sapply(boot_obj,boot.mean.f))
bootci_obj<-lapply(boot_obj,boot.ci, conf = confidence, type = "perc")
bootci.f<-function(x,column){
    x<-x[column][4]
    x<-unlist(strsplit(as.character(x[1]),","))
    x<-sub("[:punct:].*","",x)
    x<-sub("lis.*","",x)
    x<-sub(").?","",x)
    x<-na.omit(as.numeric(x))
}

cis<-data.table(t(sapply(bootci_obj,bootci.f)))
setnames(means,"V1","stat")

cis[,V1:=NULL]
cis[,V2:=NULL]
setnames(cis,c("V3","V4"),c("lci","uci"))

return(cbind(means,cis))

returns:

stat      lci       uci
1:  50.5 44.96025  56.26797
2: 99.01 97.03000 100.00000

Changing

problem=1

returns:"All values of t are equal to 100 \n Cannot calculate confidence intervals" which leads to other errors.

I would like the result to be:

stat      lci       uci
1:  50.5 44.96025  56.26797
2: 100.0 0.0000 0.00000

解決

I stacked the data.table, because it's much more efficient to work with a data.table in long format. I also prefer to set the confidence limits to the same value as the mean, if all values are equal. Adjust as you like.

library(boot)
library(data.table)

DT <- data.table(column1=1:100,column2=rep(100,100))
DT <- data.table(stack(DT))

resample.number=1000
confidence=0.95

sample.mean <- function(indata,x){mean(indata[x])}
ci.mean <- function(x, resample.number,confidence) {
  if(length(unique(x)) > 1) {
    temp <- boot.ci(boot(x,statistic = sample.mean,R = resample.number), conf = confidence, type = "perc")$percent
    list(mean=mean(x),lwr=temp[,4],upr=temp[,5])
  } else {
    list(mean=mean(x),lwr=mean(x),upr=mean(x)  
  }
}

set.seed(42)
DT[,ci.mean(values,resample.number,confidence),by=ind]

#       ind  mean       lwr       upr
#1: column1  50.5  44.92305  55.93949
#2: column2 100.0 100.00000 100.00000

Note that boot.ci just gives a warning and returns NA values, if all values are equal. There is no error and if you can work with NAs, there is no need for the if condition.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow