Question

Does it make sense to calculate the recall for each sample in a multilabel classification problem?

Suppose I have 3 data samples, each having its own true set of labels and predicted set of labels.

Sample output

I want to see the match between the true set of labels and the predicted set of labels. I do not care for the true negatives or false positives in each prediction, so this translates to recall score for me. Programmatically, I would do an AND operation between y_predicted and y_true to get the number of true positives and divide it by the total number of true labels for each sample. (in other words, true positives/(true positives+false negatives))

My question is -

Is calculating recall per sample (not per label), usually done?

Is my thought process correct?

I've seen articles where a single recall is calculated for the whole matrix of y_true and y_predicted or recall is calculated for a single label.

Was it helpful?

Solution

This metric is usually referred to as a sample-based or example-based score and can be applied in multi-label cases (to recall and other scores too). You can find a brief explanation here.

Scikit learn has an implementation for it (see here for recall) when setting average='samples' for recall_score:

'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top