Micro Average vs Macro Average for Class Imbalance

https://datascience.stackexchange.com/questions/85981

16-12-2020
|

Question

I have a dataset consisting of around 30'000 data points and 3 classes. The classes are imbalanced (around 5'000 in class 1, 10'000 in class 2 and 15'000 in class 3). I'm building a convolutional neural network model for classification of the data. For evaluation I'm looking at the AUC and ROC curves. Because I have three classes I have to either use micro- or macro-average.

To calculate the micro- and macro-averaged AUC and ROC curve, I use the approach described here: https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html The micro-averaged AUC / ROC is calculated by considering each element of the label indicator matrix as a binary prediction and the macro-averaged AUC / ROC is calculated by calculating metrics for each label, and find their unweighted mean. In my case micro-averaged AUC is usually higher than macro-averaged AUC.

If we look at the sklearn.metrics.roc_auc_score method it is written for average='macro' that

This does not take label imbalance into account.

I'm not sure if for micro-average, they use the same approach as it is described in the link above.

Is it better to use for dataset with class imbalance micro-average or macro-average? That means which metric is not affected by class imbalance? In my case micro-averaged AUC (0.85) is higher than macro-averaged AUC (0.79). When I look at the confusion matrix, the majority class is very well predicted (because the network probably learns to predict the majority class) but the minority classes are poorly predicted (almost as many false negatives as true positives). So, overall the AUC should not be that high I think.

Solution

The question is actually about understanding what it means to "take imbalance into account":

Micro-average "takes imbalance into account" in the sense that the resulting performance is based on the proportion of every class, i.e. the performance of a large class has more impact on the result than of a small class.
Macro-average "doesn't take imbalance into account" in the sense that the resulting performance is a simple average over the classes, so every class is given equal weight independently from their proportion.

Is it actually a good idea to "take imbalance into account"? It depends:

With micro-average, a classifier is encouraged to focus on the largest classes, possibly at the expense of the smallest ones. This can be considered a positive because it means that more instances will be predicted correctly.
With macro-average, a classifier is encouraged to try to recognize every class correctly. Since it is usually harder for the classifier to identify the small classes, this often makes it sacrifice some performance on the large classes. This can be considered a positive in the sense that it forces the classifier to properly distinguish the classes instead of lazily relying on the distribution of classes.

One could say that it's a kind of quantity vs. quality dilemma: micro-average gives more correct predictions, macro-average gives attention to actually distinguishing the classes.

Very often one uses macro with strongly imbalanced data, because otherwise (with micro) it's too easy for the classifier to obtain a good performance by relying only on the majority class. Your data is not strongly imbalanced so it's unlikely this would happen, but I think I would still opt for macro here.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange