Confusion between precision and recall

https://datascience.stackexchange.com/questions/77623

12-12-2020
|

Question

I have a machine learning model that try to fingerprint the functions in a binary file with a corpus. Final output of upon inputing a binary file is a table with one to one mapping between the binary function and the corpus function as follows-:

As you could see from the names, some of the functions are correct while the others are incorrect. Is there a way to calculate precision and recall for the above result? I understand that precision and recall make sense if I am doing other ML tasks such as image classification. Using a confusion matrix will help to calculate both the metrics easily. However, I am confused and feel that I could not do such measures as it is just one to one mapping which is either true or false. If precision and recall does not make sense, is there any other metrics I could use to evaluate the model? Thank you!

Solution

First of all, precision and recall are not specific to image classification; they are relevant wherever there are two distinct "positive" and "negative" classes (for example, when you test an e-mail for "spam/not-spam", or a blood sample for "has virus/does not have virus").
You can read more on this question on Cross Validated, but to sum it up - precision is the probability that a sample is positive if a test said it is, and recall is the probability that a positive sample will be reported as positive by the test.
False positives mess up your precision, and false negatives mess up your recall.

Now, your task appears to be one of multi-class classification - with at least 17 classes, from your example. I wouldn't go with precision/recall for this - you can only do it pair-wise for pairs of classes. You can, however, plot a CxC confusion matrix (where C is the number of classes), and investigate where your models tend to miss. There's an implementation in SKLearn (link).

If you need a single-number metric, I'd start with just accuracy (and develop from there). Following Nuclear Wang's comment, I'd also suggest looking at Cohen's Kappa (see explanation on Cross Validated) to better account for class imbalance.

To read more on multi-class classification, see this question. I'd also recommend this blog post on Towards Data Science

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange