What is the purpose of a confusion matrix in a classification problem?

https://datascience.stackexchange.com/questions/74227

11-12-2020
|

Question

I am studying machine learning. After some research I understood that a typical workflow for a classification problem (after having prepared the data) is the following:

Split data in test, train and validation sets
Train the model
Generate the confusion matrix
Analyze the metrics: accuracy, precision, recall and f1
Tune hyper-parameters based on the metric I have decided to optimize.

My question is: why do we ever need the confusion matrix? Shouldn't we already know what metric we need to optimize given the type of problem we are trying to solve?

I am asking this because, as far as I understand, if we have enough computational power, we could basically group steps 2 and 5 by applying a grid-search (which consists basically in a cross-validation for each tuning parameter) which takes as an input the metric to be measured. This would imply that you need to know the metric beforehand and furthermore you cannot get the confusion matrix.

Thanks in advance for the replies.

Solution

Ok, so let me answer some of these questions for you:

What is the purpose of a confusion matrix?

A confusion matrix is merely a visual help for you to better interpret the performance of your model. It's a way to graphically visualize the true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). Confusion matrices become more and more useful when you deal with a large number of different classes. It can give you some great insights about how your model is doing. Let's say you are training an image detection classifier. It might be good to know that your model is confused between dogs and wolves but isn't confused between cats and snakes.

Another purpose of the confusion matrix is to define a related cost matrix. In my example, being confused between dogs and wolves might be understandable and it shouldn't mean that your model is bad at what it does. However, if it is confused between classes it shouldn't be confused about, this should be properly represented in your performance metric.

Here is a good blog detailing these concepts: https://medium.com/@inivikrant/confusion-cost-matrix-helps-in-calculating-the-accuracy-cost-and-various-other-measurable-a725fb6b54e1

Shouldn't we already know what metric we need to optimize given the type of problem we are trying to solve?

Here you are confusing two things. On the one hand, yes, you should know in advance which metric you want to optimize (i.e. accuracy, precision, recall, etc.) but it doesn't mean that you know the value of that metric in advance. If you dumb down hyperparameter tuning, it's roughly this:

train model $M$ with hyperparameters $H$
evaluate the performance $P$ of model $M$
choose new hyperparameters $H$ and repeat step 1 and 2
pick model $M$ with hyperparameters $H$ such that $P$ is optimized

Computing the accuracy, precision, recall, or F1-score can be done if you know the TP, FP, TN, FP (see this link for more info). So technically, you don't have to create the confusion matrix per se, but you definitely need to compute the TP, FP, TN, FP to evaluate the performance of your model.

if we have enough computational power, we could basically group steps 2 and 5

You can only optimize your hyperparameters if you compute the performance of your model for each set of hyperparameters. You could skip step 3 as it doesn't technically affect your training process. It only helps you understand better what's going on. But you definitely cannot skip step 4.

OTHER TIPS

Adding 2 points to the above answer -

Confusion matrix gives a better comparative picture for a Business guy. e.g. If you inform your business that F1 Score is 0.9 is of lesser usage to him.
But he would love when you say - Model will miss 9 out of 100 Cancer cases and report 50 out of 10000 as Cancer when it is not.
When you have more than 2 classes, CM gives an idea about the learning mistake the model is making. e.g. In Fashion MNIST data, we can observers that model is confused between Shirt and Coat. You can adapt accordingly. See image below

Image credit - https://www.kaggle.com/fuzzywizard/fashion-mnist-cnn-keras-accuracy-93

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange