How to interpret classification report of scikit-learn?

https://datascience.stackexchange.com/questions/64441

19-10-2020
|

Question

As you can see, it is about a binary classification with linearSVC. The class 1 has a higher precision than class 0 (+7%), but class 0 has a higher recall than class 1 (+11%). How would you interpret this? And 2 other questions: what does "support" stand for? the precision and recall scores in the classification report are different compared to my results of sklearn.metrics.precision_score or recall_score, why is that so? :/

Solution

The classification report is about key metrics in a classification problem.

You'll have precision, recall, f1-score and support for each class you're trying to find.

The recall means "how many of this class you find over the whole number of element of this class"
The precision will be "how many are correctly classified among that class"
The f1-score is the harmonic mean between precision & recall
The support is the number of occurence of the given class in your dataset (so you have 37.5K of class 0 and 37.5K of class 1, which is a really well balanced dataset.

The thing is, precision and recall is highly used for imbalanced dataset because in an highly imbalanced dataset, a 99% accuracy can be meaningless.

I would say that you don't really need to look at these metrics for this problem , unless a given class should absolutely be correctly determined.

To answer your other question, you cannot compare the precision and the recall over two classes. This only means you're classifier is better to find class 0 over class 1.

Precision and recall of sklearn.metrics.precision_score or recall_score should not be different. But as long as the code is not provided, this is impossible to determine the root cause of this.

OTHER TIPS

We can picture how Precision and Recall as netting a drove of fishes.

Imaging, we boating over the sea and lay down our net.

If the drove of fishes is huge, while the net is pretty small -> We will see fishes in very positions in the net, means Precision is high. But we only get a minority of the drove, means Recall is low.
Meanwhile, there is just a tiny drove of fishes, but we got a huge net -> We will see that just a tiny part of the net have fishes, means Precision is low. But fortunately, we catch every fish in the drove, means Recall is high.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange