To calculate my confusion matrix with recall and precision, my test set need to be equal(balanced)?

https://datascience.stackexchange.com//questions/63540

02-12-2019
|

Question

In my CNN, I have 200 'negative' images and 50 'positive' images in my test set and I want to make a confusion matrix. My doubt is if I have to equalize the samples in the dataset because if I keep this 200 - 50 my precision falls because I have a lot of 'false positives'.

So, I have to divide the percentage of negatives by 50, or keep 200/50?

This is my results without balance the samples:

                    predicted positive         predicted negative
actual pos.              41                             9
actual neg.              31                            169

recall = 41 / 50 = 82%
precision = 41 / 72 = 57%

Solution

I don't think there is any reason to modify the matrix so keep it as it is. Even if you scale it what purpose does it serve? At the end of the day your model does not change even if you modify your confusion matrix.

In my opinion you can use other metrics e.g. f1-score (or f beta score), AUC score, etc to judge your model. Confusion matrix only provides visualization where your model "confused" and I would say it is less useful for binary classification (as you only have False positive or False negative). Metrics above serve as better judge for evaluating your model.

This is a related question which you can probably check.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange