문제

i have an imbalance data set and I used smote to oversample the minority class and undersample the majority class. now, I want to check the test AUC using predict_proba of the model.

I have two questions: 1. Do I have to correct the probability if I am comparing AUCs? 2. How can I correct it (a combination of undersampling and oversampling!)

도움이 되었습니까?

해결책

  1. No, any adjustment to the probabilities will presumably be monotonic, so the rank-ordering of the predictions will be the same, so the AUC will be the same.

  2. See, e.g., https://datascience.stackexchange.com/a/58899/55122

See also the more complex "probability calibration" techniques.

Also, if you see better results after smote+undersampling, and can share your data and work, I'd be very interested. I haven't yet seen an example where training on the original dataset doesn't do just as well (with appropriate thresholding).

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top