XG Boost result interpretation for unbalanced datasets (Accuracy & AUCROC)
-
12-12-2020 - |
Question
My dataset is of shape – 5621*8
(binary classification)
Label/target : Success (4324, 77 %) & Not success (1297, 23 %)
(success and Not success were been equally important for my prediction i.e, TP & TN)
I split my data into 3 (Train, Validate, test)
For train & Validate i perform 10 fold CV.
Test is the seperate data, which I evaluate for each folds
I tune my scale_pos_weight
ranging between 5 to 80
, and
- Finally I fixed my values as 75 since I got average higher accuracy rate for my
Test set (79 %)
for those 10 folds - But, If i check my
average auc_roc metrics it is very poor
, i.eonly 50 % for all 10 folds
.
If i did not tune scale_pos_weight my avg.accuracy drops to 50% & my avg auc_roc increases to 70 %
.
How can I interpret from the above results between AUCROC & Accuracy in this situation?
What might be the problem in my case?
La solution
With Success already being the larger class, you probably shouldn't be using a scale_pos_weight
larger than one: you want to scale the positive class's contribution to the loss function down rather than up.
I suspect that's what's happening in the first case. With scale_pos_weight=75
, the model ends up basically only caring about the positive class, predicts everyone is in the positive class, and so your accuracy is just a little better than the 77% baseline you'd expect with that strategy. With that motivation, it's not too surprising the AUC is poor, although I wouldn't have expected a drop all the way to the 50% baseline...
If you don't use scale_pos_weight
(you said "if I did not tune", but does that mean you left it at the default 1?), then the model performs better in rank-ordering (AUC=70%), but not so well in the hard classification. You might want to tweak the prediction threshold here; there's probably a different threshold that will perform better for accuracy score. You could also try scale_pos_weight=0.25
or so; that should make the default cutoff better, hopefully with little effect on AUC?