What is the appropriate statistical significance test for multi-class classification?

https://datascience.stackexchange.com/questions/85435

statistics
multiclass-classification

16-12-2020
|

Question

I have a multi-class classification problem and I am primarily using macro-average F1 measure to evaluate the performance of models.

I want to verify if the results are statistically significant.

I have the results if two classifiers on the same train/test-set.(paired observations).

Some sources suggest to use McNemar’s test for Binary classification task, however, is there any generalization of McNemar’s test for multi-class classification problem. If so, what would be the appropriate procedure to carry out these tests.

Solution

Generalisation of Mcnemars is called Cochran–Mantel–Haenszel test

There is an implementation in R, but I suppose porting to python should not be too hard. You can find the r version here

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange