Domanda

I have a multi-class classification problem and I am primarily using macro-average F1 measure to evaluate the performance of models.

I want to verify if the results are statistically significant.

I have the results if two classifiers on the same train/test-set.(paired observations).

Some sources suggest to use McNemar’s test for Binary classification task, however, is there any generalization of McNemar’s test for multi-class classification problem. If so, what would be the appropriate procedure to carry out these tests.

È stato utile?

Soluzione

Generalisation of Mcnemars is called Cochran–Mantel–Haenszel test

There is an implementation in R, but I suppose porting to python should not be too hard. You can find the r version here

Autorizzato sotto: CC-BY-SA insieme a attribuzione
scroll top