Different definitions of Macro F1 score, which one is used in Scikit-learn?

https://datascience.stackexchange.com/questions/69444

09-12-2020
|

Question

In this article Macro F1 and Macro F1 two different definitions of the F1 used in the literature are demonstrated. The first F1 score is computed such as:

F1 scores are computed for each class and then averaged via arithmetic mean

The second such as:

The harmonic mean is computed over the arithmetic means of precision and recall

I was wondering which definition is actually implemented in Scikit-learn. From the docs I cannot derive which definition is used:

Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

Solution

The first variant is implemented: $$F1_{macro}= \ \sum_{classes} \frac{F1\text{ }of \text{ }class}{number\text{ }of\text{ }classes}$$

You can find an example calculation in this answer.

Sometimes the scikit learn documentation does not include all the details. In these cases it is often helpful to look into the source code which is linked on all help-sites. Here you can find some more details on the f1 score calculation.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange