Question

How to generate a ROC curve for a cross validation?

For a single test I think I should threshold the classification scores of SVM to generate the ROC curve.

But I am unclear about how to generate it for a cross validation?

Was it helpful?

Solution

As follow-up to Backlin:

The variation in the results for different runs of k-fold or leave-n-out cross validation show instability of the models. This is valuable information.

  • Of course you can pool the results and just generate one ROC.
  • But you can also plot the set of curves
    see e.g. the R package ROCR
  • or calculate e.g. median and IQR at different thresholds and construct a band depicting these variations.
    Here's an example: the shaded areas are the inter quartile ranges observed over 125 iterations of 8-fold cross validation. The thin black areas contain half of the observed specificity-sensitivity pairs for one particular threshold, median marked by x (ignore the + marks). ROC of iterated cross validation

OTHER TIPS

After a complete round of cross validation all observations have been classified once (although by different models) and have been give an estimated probability of belonging to the class of interest, or a similar statistic. These probabilities can be used to generate a ROC curve in exactly the same way as probabilities obtained on an external test set. Just calculate the classwise error rates as you vary the classification threshold from 0 to 1 and your are all set.

However, typically you would like to perform more than one round of crossvalidation, as the performance varies depending on how the folds are divided. It is not obvious to me how to calculate the mean ROC curve of all rounds. I suggest plotting them all and calculate the mean AUC.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top