Question

I have a multi-class classification problem and I am primarily using macro-average F1 measure to evaluate the performance of models.

I want to verify if the results are statistically significant.

I have the results if two classifiers on the same train/test-set.(paired observations).

Some sources suggest to use McNemar’s test for Binary classification task, however, is there any generalization of McNemar’s test for multi-class classification problem. If so, what would be the appropriate procedure to carry out these tests.

Était-ce utile?

La solution

Generalisation of Mcnemars is called Cochran–Mantel–Haenszel test

There is an implementation in R, but I suppose porting to python should not be too hard. You can find the r version here

Licencié sous: CC-BY-SA avec attribution
scroll top