Question

I' m doing a project to detect spam accounts according to a tutorial. Two labels —— "Spam" and "Not spam" are used to train and test. Classification have been finished and I'm heading for evaluation.

The results are:

*Spam* precision: 0.962917933131
*Spam* recall: 0.6336

*Not spam* precision: 0.72697466468
*Not spam* recall: 0.9756

I've read the wiki of precision and recall, still confused and have no idea how to use it for measurement.

My purpose is to reduce the number of Normal accounts which is labelled as "Spam". It doesn't matter that some "Spam" accounts could escape. So I want to know which result above should I focus to improve? Thanks.

Was it helpful?

Solution

Precision is the fraction of results classified as positive, which are indeed positive.

Recall is the fraction of all positive results which were detected.

My purpose is to reduce the number of Normal accounts which is labelled as "Spam".

This means you want to maximize the precision of Spam and recall of Not spam. The wiki page you link to explains all you need to know - in fact your purpose is to minimize the number of "false positives" (which is included in both of these characteristics).

Suggested keyphrase: Confusion Matrix.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top