Question

I have got two problems of using pROC package to plot the ROC curve.

A. The Significance level or P-value is the probability that the observed sample Area under the ROC curve is found when in fact, the true (population) Area under the ROC curve is 0.5 (null hypothesis: Area = 0.5). If P is small (P<0.05) then it can be concluded that the Area under the ROC curve is significantly different from 0.5 and that therefore there is evidence that the laboratory test does have an ability to distinguish between the two groups.

Therefore, I would like to calculate whether a certain area under the ROC curve differs from 0.50 significantly. I found the codes using pROC package to compare TWO ROC curves as follows, but not sure how to test if it is 0.5 significant.

library(pROC)  
data(aSAH)    

rocobj1 <- plot.roc(aSAH$outcome, aSAH$s100,  
                    main="Statistical comparison", 
                    percent=TRUE, col="#1c61b6")  

rocobj2 <- lines.roc(aSAH$outcome, aSAH$ndka, 
                     percent=TRUE, col="#008600")  

testobj <- roc.test(rocobj1, rocobj2)  
text(50, 50, 
     labels=paste("p-value =", format.pval(testobj$p.value)), 
     adj=c(0, .5))  

legend("bottomright", legend=c("S100B", "NDKA"), 
       col=c("#1c61b6", "#008600"), lwd=2)

B. I have done a k-fold cross-validation for my classification problem. For example, 5 fold cross-validation will produce 5 ROC curves. Then how to plot the average of these 5 ROC curves using pROC package (What I want to do is explained at this webpage but done in Python: enter link description here)? Another thing is can we get the confidence interval and the best threshold for this average ROC curve (something like the codes implemented below)?

    rocobj <- plot.roc(aSAH$outcome, aSAH$s100b,  
                       main="Confidence intervals", 
                       percent=TRUE,  ci=TRUE, # compute AUC (of AUC by default)  
                       print.auc=TRUE) # print the AUC (will contain the CI)  

    ciobj <- ci.se(rocobj, # CI of sensitivity  
                   specificities=seq(0, 100, 5)) # over a select set of specificities  
    plot(ciobj, type="shape", col="#1c61b6AA") # plot as a blue shape  
    plot(ci(rocobj, of="thresholds", thresholds="best")) # add one threshold

Refs:

http://web.expasy.org/pROC/screenshots.html

http://scikit-learn.org/0.13/auto_examples/plot_roc_crossval.html

http://www.talkstats.com/showthread.php/14487-ROC-significance

http://www.medcalc.org/manual/roc-curves.php

Was it helpful?

Solution

A. Use a wilcox.test which does exactly that.

B. See my answer to this question: Feature selection + cross-validation, but how to make ROC-curves in R and simply concatenate the data in each fold of the cross-validation (but don't do that with bootstrap, LOO, when you repeat the whole cross-validation multiple times, or when the predictions can't be compared between run).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top