area under the precision-recall curve in R or other summary quantities

https://stackoverflow.com/questions/19696626

02-07-2022
|

Question

I plan to use the precision-recall plot (PR plot) to compare models. See the attached figure (partial screenshot, sorry!) below. Obviously I have the true positives, true negatives, false positives and false negatives at hand, and I need a a single summary quantity for each model. Here are my questions:

Area Under the PR curve (AUC) is the first quantity, but I don't know how to calculate that in R. I do NOT want to use any package like ROCR because all the codes are written by myself and I hope to write my own codes using the quantities available. It seems that there are many ways -- I hope to know which one is the most implementable.
Another quantity is the F-measure: a measure that combines precision and recall is the harmonic mean of precision and recall, the traditional F-measure or balanced F-score. However, I am curious if this is better than the AUC in #1 or they are describing different things? Moreover, since I have a bunch of Recall and Precision values, how can I calculate a single F measure in this case (see Figure below).

Thank you!

enter image description here

Solution

To calculate the AUC of a curve, you can use a numeric integration function such as trapz() in the caTools package.

auc <- trapz(recall, precision)

The F-score is the harmonic mean for a given cutoff value. In your case, you would get many F-scores for each curve so it would not summarize the curve as you like.

The AUC describes the performance of the model across possible values of the continuous output from the model. The F-score describes a model at a particular cutpoint. It is more of a way to combine recall and precision to a single statistic.

Be careful when explaining it though. Usually, AUC is discussed in the context of sensitivity and specificity.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow