Question

Background

I built a complaints management system for my company. It works fine. I'm interested in using the data it contains to do predictive modelling on complaints. We have ~40,000 customers of whom ~400 have complained.

Problem

I want to use our complaints data to model the probability that any given customer will complain. My concern is that a model giving each customer a probability of 0.000 for complaining would already be 99% accurate and thus hard to improve upon. Is it even possible to build a useful predictive model of the kind I describe trying to predict such a rare event with so little data?

Était-ce utile?

La solution

That is why there are alternative measures than just accuracy.

Here, recall is probably what you are interested in. An in order to balance precision and recall, F1 is a popular mixture that takes both into account.

But in general, avoid trying to break down things into a single number.

It's a 1 dimensional result, and too much of a simplification. In practise, you will want to study errors in detail, to avoid a systematic error from happening.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top