Question

I have been trying to evaluate my models used on fire systems dataset with a huge imbalance in the dataset. Most models failed to predict any true positives correctly however naive Bayes managed to do that but with a very high rate of False Positive. I had run the experiments on both the confusion matrix and classification report for both can be seen below. The same dataset and train/test split was used with both of the datasets

 Naive Bayes Confusion Matrix and Classification Report

     [[TN=732 FP=448]
     [FN=2   TP=15]]


          precision    recall  f1-score   support

       0       1.00      0.62      0.76      1180
       1       0.03      0.88      0.06        17

accuracy                               0.62      1197
macro avg          0.51      0.75      0.41      1197
weighted avg       0.98      0.62      0.75      1197


Logistic Regression Confusion Matrix and Classification Report


     [[TN=1180 FP=0]
     [FN=17   TP=0]]


          precision    recall  f1-score   support

       0       0.99      1.00      0.99      1180
       1       0.00      0.00      0.00        17

accuracy                              0.98      1197
macro avg          0.49      0.50     0.50      1197
weighted avg       0.97      0.99     0.98      1197

However I got the Kohen Kappa Coefficient for these models and I am quite confused on how to interpret the values. Please find values below

Logistic Regression=0.0
Naive Bayes=0.03

These values indicate very slight agreement. But why is the value of Naive Bayes slightly better than Logistic regression ?

Was it helpful?

Solution

Logistic Regression is only predicting one class (in this case the negative class)! Because of the high imbalance in the data, this model gives a high accuracy score. This metric, however, isn't reliable for imbalanced datasets. A more proper metric like Cohen's Kappa penalizes this behavior.

Naive Bayes, on the other hand, tries to predict both classes. It misses a lot more predictions this way, but it's Kappa is higher.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top