Question

In order to build a predict model with two categories (buy or not buy),I want to use RandomForest and predict with type='prob', so I can have a prob of someone buy or not buy. So, with this outcome I can clusterize and make groups, like this:

group A: costumer who has [100 to 80]% of buy. group B: costumer who has [81 to 60]% of buy. ...

But I don't know the appropriate evaluation metric to measure the accuracy of this model. I guess that I can't use a confusion matrix.

Maybe I can use a ROC curve, and or measure KS between the buy group with the not buy group. But I'm not sure about this metrics.

Was it helpful?

Solution

You should select some threshold, let's say 0.5 and treat customers with probability below threshold as not buy and above as buy. Based on this you can compute accuracy of your model. You can also check the ROC curve of the model. You can check various metric for binary classifier here

OTHER TIPS

Group the entires by the probability score you assigned them (e.g. 100-95%, 95-90%,....) and then measure the likelihood of their purchases from ground truth. You can then perform a Chi-squared calculation between your predicted and measured outcomes.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top