Confusion regarding confusion matrix

https://datascience.stackexchange.com/questions/76861

12-12-2020
|

質問

I am confused on how to represent the confusion matrix -- where to put the FP and FN. Link1 and Link2 show different confusion matrix for binary classification. The rows represent the actual and columns represent the predicted values. Based on my understanding, the correct confusion matrix should have been:

                        | Pred Neg| Pred Pos
                        |___________________
Actual Negative(class0) |  TN     |  FP
------------------      |-------------
Actual Positive(Class1) | FN      | TP

where TN (class 0) indicates the amount of correctly identified normal patterns. TP (class 1) indicates the amount of correctly identified malignant patterns. On the other hand, FP indicates that the classifier predicted the signature to be malignant when infact it was normal.

$ \text{Precision} = \frac{TP}{TP+FP}$, $\text{Recall or TPR} = \frac{TP}{TP+FN}$,

Question 1: Is my understanding and construction of the confusion matrix correct?

Question 2: What is the intuitive difference between Precision and recall? What happens if precision < recall?

解決

Question 1: Is my understanding and construction of the confusion matrix correct?

Yes, you are correct in your definitions and the way you construct the confusion matrix. The links you have provided also agree with each other. They just switch rows and columns, since there is no hard rule regarding the presentation, as long as the correct relations are maintained.

Link 1 shows this matrix:

          | Pos Class | Neg Class
Pos Pred  |    TP     |    FP
Neg Pred  |    FN     |    TN

Link 2 shows the same matrix, but transposed:

          | Pos Pred  | Neg Pred
Pos Class |    TP     |    FN
Neg Class |    FP     |    TN

Question 2: What is the intuitive difference between Precision and recall?

Precision is the rate at which you are correct when you predict a positive class. It takes into account all of your positive predictions and figures out which proportion of those is actually correct. When your precision is high, this means that once you make a positive prediction, you are likely to be correct about it. This says nothing about how correct your negative predictions are -- you might make 1 positive and 99 negative predictions on 100 actual positives and still get 100% precision, since your only positive prediction just happened to be correct.

Recall is the rate at which you are able to predict the positive class correctly. It takes into account all of the actual positive classes and figures out which proportion of those you have predicted correctly. When your recall is high, this means that very few actual positives slip by your model without being detected as such. This says nothing about how good you are at being actually correct with your positive predictions -- a model that always predicts a positive class easily achieves 100% recall.

One usually strives to optimize both precision and recall by finding the most acceptable balance between the two. You might want to read this article about the Precision-Recall curve to get a fuller understanding of the relationship between these metrics.

What happens if precision < recall?

As you have highlighted in your post, the two formulas differ only in the denominator. It follows that when precision is smaller than recall, then the number of false positives in your predictions is larger than the number of false negatives.

ライセンス： CC-BY-SA と帰属

所属していません datascience.stackexchange