문제

I am trying to learn data modeling by working on a dataset from Kaggle competition. As the competition was closed 2 years back, I am asking my question here. The competition uses AUC-ROC as the evaluation metric. This is a classification problem with 5 labels. I am modeling it as 5 independent binary classification problems. Interestingly, the data is highly imbalanced across labels. In one case, there is an imbalance of 333:1. I did some research into interpreting the AUC-ROC metric. During my research, I found this and this. Both these articles basically say that AUC-ROC is not a good metric for an imbalanced data set. So, I am wondering why would they be using this metric to evaluate models in the competition? Is it even a reasonable metric in such a context? If yes, why?

도움이 되었습니까?

해결책

As you would have seen in the research, AUC ROC prioritizes getting the order of the predictions correct, rather than approximating the true frequencies.

Usually, like in the credit card fraud problem you link to, the impact of one or two false negative is more devastating that many false positives. If those classes are imbalanced, like they are in the fraud case, AUC ROC is a bad idea.

It appears that in the competition you are referring to, the hosts are more interested in labeling which comments are more toxic than others rather than rating how toxic they each are. This makes sense since in reality the labels are subjective.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top