Question

I'm currently try to implement the calculation of a ROC curve in ruby. I tried to transform the pseudocode from http://people.inf.elte.hu/kiss/13dwhdm/roc.pdf (see 6th site, chapter 5, Algorithm 1 "Efficient Method for generating ROC points") into Ruby code.

I worked out a simple example, but I'm always getting values over 1.0 for recall. I think I misunderstood something, or made a mistake at programming. Here is what I gor so far:

# results from a classifier
# index 0: users voting
# index 1: estimate from the system
results = [[5.0,4.8],[4.6,4.2],[4.3,2.2],[3.1,4.9],[1.3,2.6],[3.9,4.3],[1.9,2.4],[2.6,2.3]]
# over a score of 2.5 an item is a positive one
threshold = 2.5
# sort by index 1, the estimate
l_sorted = results.sort { |a,b| b[1] <=> a[1] }

# count the real positives and negatives
positives, negatives = 0, 0
positives, negatives = 0, 0
l_sorted.each do |item|
  if item[0] >= threshold
    positives += 1
  else
    negatives += 1
  end
end

fp, tp = 0, 0
# the array that holds the points
r = []
f_prev = -Float::INFINITY

# iterate over all items
l_sorted.each do |item|
  # if the score of the former iteration is different,
  # add another point to r
  if item[1]!=f_prev
    r.push [fp/negatives.to_f,tp/positives.to_f]
    f_prev = item[1]
  end
  # if the current item is a real positive
  # (user likes the item indeed, and estimater was also correct)
  # add a true positive, otherwise, add a false positve
  if item[0] >= threshold && item[1] >= threshold
    tp += 1
  else
    fp += 1
  end
end

# push the last point (1,1) to the array
r.push [fp/negatives.to_f,tp/positives.to_f]

r.each do |point|
  puts "(#{point[0].round(3)},#{point[1].round(3)})"
end

Based on a results array of arrays, the code tries to calculate the points. I'm not sure what the f_prev is all about. Is in the f_prev the score of the classifier stored, or only if it's true or false?

It would be awesome, if someone could have a quick look at my code, and help me find my mistake. thx!

Was it helpful?

Solution

My second answer is an analysis of your code, and pointing out where I think you have made some mistakes or are confused. I am assuming that you want to reproduce a graph similar to that seen on page 864 of your linked PDF.

An ROC plot like that on p864, is a graph showing available compromises in your predictive model between false positive and true positive rates. To see all possible compromises, you need to visit all data points where the threshold would make a difference, and plot their false positive vs true positive rate.

Your first point of confusion seems to be that you have a "users voting" float score instead of a true/false category. The example in the PDF has p/n cases already determined for plotting the ROC.

# results from a classifier
# index 0: users voting
# index 1: estimate from the system
results = [[5.0,4.8],[4.6,4.2],[4.3,2.2],[3.1,4.9],[1.3,2.6],[3.9,4.3],[1.9,2.4],[2.6,2.3]]

So I think you would be better off having

results = [[true,4.8],[true,4.2],[true,2.2],[true,4.9],[false,2.6],[true,4.3],[false,2.4],[true,2.3]]

before you start to plot the ROC. It would be fine to do this conversion inline, but you need to separate concerns of how you generate your test data, from your ROC plot - for instance, the fact that your user scores and machine estimate scores are on the same scale is irrelevant.

Which leads to the threshold variable. You can use e.g. 2.5 to convert your user data, but this has no bearing on your ROC plot. In fact to get the full ROC plot you need to test multiple values of threshold for how they affect true and false positive rates.

# over a score of 2.5 an item is a positive one
threshold = 2.5

This sorts the values into reverse order, with the highest-scoring items first. You could do it either way, but to me that means you want to start at a high threshold (where all your scores predict false), and at position [0.0,0.0] on the graph

# sort by index 1, the estimate
l_sorted = results.sort { |a,b| b[1] <=> a[1] }

The following code looks accurate enough, but really it is just summing the test positives and negatives, so shouldn't be messing with concepts of threshold:

# count the real positives and negatives
positives, negatives = 0, 0
positives, negatives = 0, 0
l_sorted.each do |item|
  if item[0] >= threshold
    positives += 1
  else
    negatives += 1
  end
end

A nicer Ruby way of putting the same logic, assuming you replace the user scores with true/fasle values somewhere else might be

positives = l_sorted.select { |item| item[0] }.count
negatives = l_sorted.count - positives

This looks OK, you do indeed start at [0.0,0.0] with

fp, tp = 0, 0
# the array that holds the points
r = []

However, this looks like the starting threshold

f_prev = -Float::INFINITY

so would logically be positive Float::Infinity in my opinion, such that all your predictions are initially false (hence fp and tp logically have to be 0 because there is no p allowed at all). It doesn't matter though, since you don't use the value.


Inside the loop, what is going on is the code is tracking what the total false positives and true positives would be if the threshold was set just above the current item. As you lower this bar past groups of items with the same score, they will predict positive values (no need to test this versus the threshold variable, which was confusing you). All you have to do is sort those positive values into tp or fp counts. The check versus f_prev is just helping to group similar items, you only plot one point if 3 predictions have the same score.

# iterate over all items
l_sorted.each do |item|
  if item[1]!=f_prev
    # Plot a point, assuming all predictions with a score equal or lower than current
    # item are thresholded out as negative.
    r.push [fp/negatives.to_f,tp/positives.to_f]
    f_prev = item[1]
  end
  # Assume the current prediction is now positive, and calculate how that affects the curve
  # if the current test item is a real positive
  # add to true positives, otherwise, it has become a false positve
  if item[0]
    tp += 1
  else
    fp += 1
  end
end

# push the last point (1,1) to the array
r.push [fp/negatives.to_f,tp/positives.to_f]

As well as altering the test, I removed an inaccurate comment ("the estimator is also correct") - we are not judging in this code whether the estimator is "correct" or not for a single value, we are just seeing how well it scores fp vs tp at a particular cutoff point. The single pass process on the sorted list relies on the fact that this will be a small incremental change from the last point plotted, based on changes to fp and tp counts.

This should now go from [0.0,0.0] to [1.0,1.0]

r.each do |point|
  puts "(#{point[0].round(3)},#{point[1].round(3)})"
end

OTHER TIPS

This answer is incorrect, as it assumed from the OPs comment that the algorithm required a per item assessment of false positive and true positive assignment. In fact, the variables tp and fp are tracking totals for the whole data set, and just being adjusted assuming that the current prediction in the loop has become positive. See my other answer.


In this code block:

  if item[0] >= threshold && item[1] >= threshold
    tp += 1
  else
    fp += 1
  end

You appear to be counting anything other than a "true positive" as a "false positive".

This is not correct, you are ignoring the possibility that the result is a true or false negative classification. Try this:

  if item[0] >= threshold && item[1] >= threshold
    tp += 1
  elsif item[0] < threshold && item[1] >= threshold
    fp += 1
  end

or, slightly DRY-er

  if item[1] >= threshold
    if item[0] >= threshold
      tp += 1
    else
      fp += 1
    end
  end
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top