Feature selection + cross-validation, but how to make ROC-curves in R

Question

You can indeed store the predictions if they are all on the same scale (be especially careful about this as you perform feature selection... some methods may produce scores that are dependent on the number of features) and use them to build a ROC curve. Here is the code I used for a recent paper:

library(pROC)
data(aSAH)
k <- 10
n <- dim(aSAH)[1]
indices <- sample(rep(1:k, ceiling(n/k))[1:n])

all.response <- all.predictor <- aucs <- c()
for (i in 1:k) {
  test = aSAH[indices==i,]
  learn = aSAH[indices!=i,]
  model <- glm(as.numeric(outcome)-1 ~ s100b + ndka + as.numeric(wfns), data = learn, family=binomial(link = "logit"))
  model.pred <- predict(model, newdata=test)
  aucs <- c(aucs, roc(test$outcome, model.pred)$auc)
  all.response <- c(all.response, test$outcome)
  all.predictor <- c(all.predictor, model.pred)
}

roc(all.response, all.predictor)
mean(aucs)

The roc curve is built from all.response and all.predictor that are updated at each step. This code also stores the AUC at each step in auc for comparison. Both results should be quite similar when the sample size is sufficiently large, however small samples within the cross-validation may lead to underestimated AUC as the ROC curve with all data will tend to be smoother and less underestimated by the trapezoidal rule.