Question

I'm trying to calculate the AUC for a large-ish data set and having trouble finding one that both handles values that aren't just 0's or 1's and works reasonably quickly.

So far I've tried the ROCR package, but it only handles 0's and 1's and the pROC package will give me an answer but could take 5-10 minutes to calculate 1 million rows.

As a note all of my values fall between 0 - 1 but are not necessarily 1 or 0.

EDIT: both the answers and predictions fall between 0 - 1.

Any suggestions?

EDIT2:

ROCR can deal with situations like this:

Ex.1
actual   prediction
  1         0
  1         1
  0         1
  0         1 
  1         0

or like this:

Ex.2
actual   prediction
  1         .25
  1         .1
  0         .9
  0         .01
  1         .88

but NOT situations like this:

Ex.3
actual   prediction
  .2         .25
  .6         .1
  .98        .9
  .05        .01
  .72        .88

pROC can deal with Ex.3 but it takes a very long time to compute. I'm hoping that there's a faster implementation for a situation like Ex.3.

Was it helpful?

Solution

So far I've tried the ROCR package, but it only handles 0's and 1's

Are you talking about the reference class memberships or the predicted class memberships? The latter can be between 0 and 1 in ROCR, have a look at its example data set ROCR.simple.

If your reference is in [0, 1], you could have a look at (disclaimer: my) package softclassval. You'd have to construct the ROC/AUC from sensitivity and specificity calculations, though. So unless you think of an optimized algorithm (as ROCR developers did), it'll probably take long, too. In that case you'll also have to think what exactly sensitivity and specificity should mean, as this is ambiguous with reference memberships in (0, 1).

Update after clarification of the question

You need to be aware that grouping the reference or actual together looses information. E.g., if you have actual = 0.5 and prediction = 0.8, what is that supposed to mean? Suppose these values were really actual = 5/10 and prediction = 5/10. By summarizing the 10 tests into two numbers, you loose the information whether the same 5 out of the 10 were meant or not. Without this, actual = 5/10 and prediction = 8/10 is consistent with anything between 30 % and 70 % correct recognition!

Here's an illustration where the sensitivity is discussed (i.e. correct recognition e.g. of click-through):

soft and

You can find the whole poster and two presentaions discussing such issues at softclassval.r-forge.r-project.org, section "About softclassval".

Going on with these thoughts, weighted versions of mean absolute, mean squared, root mean squared etc. errors can be used as well.

However, all those different ways to express of the same performance characteristic of the model (e.g. sensitivity = % correct recognitions of actual click-through events) do have a different meaning, and while they coincide with the usual calculation in unambiguous reference and prediction situations, they will react differently with ambiguous reference / partial reference class membership.

Note also, as you use continuous values in [0, 1] for both reference/actual and prediction, the whole test will be condensed into one point (not a line!) in the ROC or specificity-sensitivity plot.

Bottom line: the grouping of the data gets you in trouble here. So if you could somehow get the information on the single clicks, go and get it!

OTHER TIPS

Can you use other error measures for assessing method performance? (e.g. Mean Absolute Error, Root Mean Square Error)?

This post might also help you out, but if you have different numbers of classes for observed and predicted values, then you might run into some issues.

https://stat.ethz.ch/pipermail/r-help/2008-September/172537.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top