Is Gini coefficient a good metric for measuring predictive model performance on highly imbalanced data

https://datascience.stackexchange.com/questions/19755

22-10-2019
|

Pergunta

I am evaluating a Credit Risk model that predicts the estimated likelihood of customers defaulting on their mortgage accounts. The model is a Logistic Regression estimator and was built by another team. They use the Gini metric to measure the performance of the model. They achieved 87%. Upon evaluation, I found that the recall was 51% whilst the error rate of the non rare event class (do not default) was 0.9%. Am I correct in thinking that the Gini is actually a misleading metric in this case because it doesn't really show the extremely poor predictive performance of the rare event class? I have questioned them about this and tried to recommend them to use precision/recall metrics as well as confusion matrices and a precision-recall trade-off graph but they quickly dismissed me.

Any advice would be much appreciated.

Solução

The Gini Coefficient can also be expressed in terms of the area under the ROC curve (AUC): G = 2*AUC -1 link. The ROC curve, on the other hand, is influenced by class imbalance through the false positive rate FP/(FP+TN). If the number of negatives is a lot larger, this could be a potential issue.

In short, the Gini Coefficient has similar pros and cons as the AUC ROC metric.

Outras dicas

Gini coefficient shouldn't be to my understanding a bad mertric for imbalanced classification, because it is related to AUC, which works just fine. Maybe it was gini impurity not coefficient. Check your AUC of the predictions once. Also Area under the PR curve is a better metric for imbalanced classification than AUC, maybe you should see that too.

Credit models do not do a great job of predicting individual defaults, and the error rates are usually high. That is, a fairly high proportion of dubious borrowers do not default. One can always reduce this proportion by making the cutoff more generous, so that only the worst borrowers are left in the "bad" pool; but the necessary tradeoff is that more borrowers must be put into the "good" pool, so more defaults occur in the "good" pool.

The Gini (or the roughly equivalent AUC) is a reasonable tool for assessing the performance of the model across the whole range of credit cutoffs, but in practice this is not usually what we want. We really want to make our lending business profitable, which means we have to consider how much profit we make from good mortgages and how much we lose from defaults. The best model is the one that gives the best tradeoff between these. This has nothing to do with our success at predicting individual defaults, which is why the Gini is not really useful.

Because the costs and profit numbers are specific to each lender, it is quite possible that Model A will work better than Model B for one lender, while Model B will work better than Model A for another lender. There is no model that is best for every lender.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange