What is the formula to calculate the precision, recall, f-measure with macro, micro, none for multi-label classification in sklearn metrics?
-
20-10-2020 - |
Frage
I am working in the problem of multi-label classification tasks. But I would not able to understand the formula for calculating the precision, recall, and f-measure with macro, micro, and none. Moreover, I understood the formula to calculate these metrics for samples. Even, I am also familiar with the example-based, label-based, and rank-based metrics.
For instance,
import numpy as np
from sklearn.metrics import hamming_loss, accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import multilabel_confusion_matrix
y_true = np.array([[0, 1, 1 ],
[1, 0, 1 ],
[1, 0, 0 ],
[1, 1, 1 ]])
y_pred = np.array([[0, 1, 1],
[0, 1, 0],
[1, 0, 0],
[1, 1, 1]])
conf_mat=multilabel_confusion_matrix(y_true, y_pred)
print("Confusion_matrix_Train\n", conf_mat)
Confusion matrix output:
[[[1 0]
[1 2]]
[[1 1]
[0 2]]
[[1 0]
[1 2]]]
Macro score
print("precision_score:", precision_score(y_true, y_pred, average='macro'))
print("recall_score:", recall_score(y_true, y_pred, average='macro'))
print("f1_score:", f1_score(y_true, y_pred, average='macro'))
Macro score output:
precision_score: 0.8888888888888888
recall_score: 0.7777777777777777
f1_score: 0.8000000000000002
Micro score
print("precision_score:", precision_score(y_true, y_pred, average='micro'))
print("recall_score:", recall_score(y_true, y_pred, average='micro'))
print("f1_score:", f1_score(y_true, y_pred, average='micro'))
Micro score output:
precision_score: 0.8571428571428571
recall_score: 0.75
f1_score: 0.7999999999999999
Weighted score
print("precision_score:", precision_score(y_true, y_pred, average='weighted'))
print("recall_score:", recall_score(y_true, y_pred, average='weighted'))
print("f1_score:", f1_score(y_true, y_pred, average='weighted'))
Weighted score output:
precision_score: 0.9166666666666666
recall_score: 0.75
f1_score: 0.8
Samples score
print("precision_score:", precision_score(y_true, y_pred, average='samples'))
print("recall_score:", recall_score(y_true, y_pred, average='samples'))
print("f1_score:", f1_score(y_true, y_pred, average='samples'))
Samples score output:
precision_score: 0.75
recall_score: 0.75
f1_score: 0.75
None score
print("precision_score:", precision_score(y_true, y_pred, average=None))
print("recall_score:", recall_score(y_true, y_pred, average=None))
print("f1_score:", f1_score(y_true, y_pred, average=None))
None score output:
precision_score: [1. 0.66666667 1. ]
recall_score: [0.66666667 1. 0.66666667]
f1_score: [0.8 0.8 0.8]
Thanks in advance for your help.
Lösung
Generally, the scoring metrics you are looking at are defined as following (see for example Wikipedia):
$$precision = \frac{TP}{TP+FP}$$ $$recall= \frac{TP}{TP+FN}$$ $$F1 = \frac{2 \times precision \times recall}{precision + recall}$$
For the multi-class case scikit learn offers the following parameterizations (see here for example):
'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro': Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'weighted': Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).
And none
does the following:
If
None
, the scores for each class are returned.
TLDR: "micro" calculates the overall metric, "macro" derives an average assigning each class an equal weight and "weighted" calculates an average assigning each class a weight based on the number of ocurences (its support).
Accordingly, the calculations in your example go like this:
Macro
$$precision_{macro} = \sum_{classes} \frac{precision\text{ }of \text{ }class}{number\text{ }of\text{ }classes} = \frac{(2/2) + (2/3) + (2/2)}{3} \approx 0.89$$
$$recall_{macro} = \sum_{classes} \frac{recall\text{ }of \text{ }class}{number\text{ }of\text{ }classes} = \frac{(2/3) + (2/2) + (2/3)}{3} \approx 0.78$$
$$F1_{macro}= \ \sum_{classes} \frac{F1\text{ }of \text{ }class}{number\text{ }of\text{ }classes} = \frac{1}{3} \times \frac{2 \times (2/2) \times (2/3)}{(2/2) + (2/3)} + \frac{1}{3} \times \frac{2 \times (2/3) \times (2/2)}{(2/3) + (2/3)} + \frac{1}{3} \times \frac{2 \times (2/2) \times (2/3)}{(2/2) + (2/3)} \approx 0.80$$
Note that macro means that all classes have the same weight, i.e. $\frac{1}{3}$ in your example. That is where the $\times \frac{1}{3}$ to calculate the F1 score comes from.
Micro
$$precision_{micro} = \frac{\sum_{classes} TP\text{ }of \text{ }class}{\sum_{classes} TP\text{ }of\text{ }class + FP\text{ }of\text{ }class } = \frac{2+2+2}{2+3+2} \approx 0.86$$
$$recall_{micro} = \frac{\sum_{classes} TP\text{ }of \text{ }class}{\sum_{classes} TP\text{ }of\text{ }class+FN\text{ }of\text{ }class} = \frac{2+2+2}{3+2+3} = 0.75$$
$$F1_{micro}= 2\times \frac{recall_{micro} \times precision_{micro}}{recall_{micro} + precision_{micro}} \approx 0.8$$
Weighted $$precision_{weighted} = \sum_{classes}{weight\text{ }of \text{ }class \times precision\text{ }of\text{ }class} = \frac{3}{8}\times\frac{2}{2} + \frac{2}{8}\times\frac{2}{3} + \frac{3}{8} \times \frac{2}{2} \approx 0.92$$
$$recall_{weighted} = \sum_{classes}{weight\text{ }of \text{ }class \times recall\text{ }of\text{ }class} = \frac{3}{8} \times \frac{2}{3} + \frac{2}{8}\times\frac{2}{2} + \frac{3}{8} \times \frac{2}{3} = 0.75$$
$$F1_{weighted} = \sum_{classes}{weight\text{ }of \text{ }class \times F1\text{ }of\text{ }class} = \frac{3}{8} \times \frac{2 \times (2/2) \times (2/3)}{(2/2) + (2/3)} + \frac{2}{8} \times \frac{2 \times (2/3) \times (2/2)}{(2/3) + (2/3)} + \frac{3}{8} \times \frac{2 \times (2/2) \times (2/3)}{(2/2) + (2/3)} = 0.8$$
None
$precision_{class 1} = \frac{2}{2} = 1.0$
$precision_{class 2} = \frac{2}{2+1} \approx 0.67$
$precision_{class 3} = \frac{2}{2} = 1.0$
$recall_{class 1} = \frac{2}{2+1} \approx 0.67$
$recall_{class 2} = \frac{2}{2} = 1.0$
$recall_{class 3} = \frac{2}{2+1} \approx 0.67$
$F1_{class 1} = \frac{2 \times 1 \times \frac{2}{3}}{1 + \frac{2}{3}} = 0.8$
$F1_{class 2} = \frac{2 \times \frac{2}{3}\times 1}{\frac{2}{3} + 1} = 0.8$
$F1_{class 3} = \frac{2 \times 1 \times \frac{2}{3}}{1 + \frac{2}{3}} = 0.8$
Samples
$$Precision_{samples}= \frac{1}{Number\, of\, examples} \sum_{examples} \frac{TP\,of\,example}{TP\,of\,example + FP\,of\,example} = \frac{1}{4}[\frac{2}{2}+\frac{0}{1}+\frac{1}{1}+\frac{3}{3}] = 0.75$$
$$Recall_{samples}= \frac{1}{Number\, of\, examples} \sum_{examples} \frac{TP\,of \,example}{TP\,of\,example + FN\,of\,example} = \frac{1}{4}[\frac{2}{2}+\frac{0}{2}+\frac{1}{1}+\frac{3}{3}] = 0.75$$
$$F1_{samples}= 2\times \frac{recall_{samples} \times precision_{samples}}{recall_{samples} + precision_{samples}} = 0.75$$
Andere Tipps
A macro-average will compute the metric independently for each class and then take the average (hence treating all classes equally), whereas a micro-average will aggregate the contributions of all classes to compute the average metric.
In your case as per your confusion matrix,
Class 1 TP = 1 FP = 0
Class 2 TP = 1 FP = 1
Class 3 TP = 1 FP = 0
and the precision formula is given as TP/(TP + FP)
So precision
Pa = 1 /( 1 + 0 ) = 1
pb = 1 /( 1 + 1) = 0.5
pc = 1 /(1 + 0 ) = 1
Precision with Macro is
Pma = pa + pb + pc / 3 = 1 + 0.5 + 1 / 3 = 0.8333
Precision with Micro is
Pmi = TPa + TPb + TPc / (TPa + FPa + TPb + FPb + TPc + FPc) = 1 + 1 + 1 / ( 1 + 0 + 1 + 1 + 1 + 0) = 0.75
Please refer to the below link which very well described the difference between Marco and Micro.
Micro Average vs Macro average Performance in a Multiclass classification setting
https://towardsdatascience.com/multi-class-metrics-made-simple-part-ii-the-f1-score-ebe8b2c2ca1