When is an event too rare for predictive modelling to be worthwhile?

https://stackoverflow.com/questions/22984603

01-07-2023
|

Question

Background

I built a complaints management system for my company. It works fine. I'm interested in using the data it contains to do predictive modelling on complaints. We have ~40,000 customers of whom ~400 have complained.

Problem

I want to use our complaints data to model the probability that any given customer will complain. My concern is that a model giving each customer a probability of 0.000 for complaining would already be 99% accurate and thus hard to improve upon. Is it even possible to build a useful predictive model of the kind I describe trying to predict such a rare event with so little data?

Solution

That is why there are alternative measures than just accuracy.

Here, recall is probably what you are interested in. An in order to balance precision and recall, F1 is a popular mixture that takes both into account.

But in general, avoid trying to break down things into a single number.

It's a 1 dimensional result, and too much of a simplification. In practise, you will want to study errors in detail, to avoid a systematic error from happening.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow