Question

I am building receiver operating characteristic (ROC) curves to evaluate classifiers using the area under the curve (AUC) (more details on that at end of post). Unfortunately, points on the curve often go below the diagonal. For example, I end up with graphs that look like the one here (ROC curve in blue, identity line in grey) :

Fixing the ROC

The the third point (0.3, 0.2) goes below the diagonal. To calculate AUC I want to fix such recalcitrant points.

The standard way to do this, for point (fp, tp) on the curve, is to replace it with a point (1-fp, 1-tp), which is equivalent to swapping the predictions of the classifier. For instance, in our example, our troublesome point A (0.3, 0.2) becomes point B (0.7, 0.8), which I have indicated in red in the image linked to above.

This is about as far as my references go in treating this issue. The problem is that if you add the new point into a new ROC (and remove the bad point), you end up with a nonmonotonic ROC curve as shown (red is the new ROC curve, and dotted blue line is the old one):

New ROC

And here I am stuck. How can I fix this ROC curve?

Do I need to re-run my classifier with the data or classes somehow transformed to take into account this weird behavior? I have looked over a relevant paper, but if I am not mistaken, it seems to be addressing a slightly different problem than this.

In terms of some details: I still have all the original threshold values, fp values, and tp values (and the output of the original classifier for each data point, an output which is just a scalar from 0 to 1 that is a probability estimate of class membership). I am doing this in Matlab starting with the perfcurve function.


Was it helpful?

Solution

Note based on some very helpful emails about this from the people that wrote the articles cited above, and the discussion above, the right answer seems to be: do not try to "fix" individual points in an ROC curve unless you build an entirely new classifier, and then be sure to leave out some test data to see if that was a reasonable thing to do.

Getting points below the identity line is something that simply happens. It's like getting an individual classifier that scores 45% correct even though the optimal theoretical minimum is 50%. That's just part of the variability with real data sets, and unless it is significantly less than expected based on chance, it isn't something you should worry too much about. E.g., if your classifier gets 20% correct, then clearly something is amiss and you might look into the specific reasons and fix your classifier.

OTHER TIPS

Yes, swapping a point for (1-fp, 1-tp) is theoretically effective, but increasing sample size is a safe bet too.

It does seem that your system has a non-monotonic response characteristic so be careful not to bend the rules of the ROC too much or you will impact the robustness of the AUC.

That said, you could try to use a Pareto Frontier Curve (Pareto Front). If that fits the requirements of "Repairing Concavities" then you'll basically sort the points so that the ROC curve becomes monotonic.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top