Kohen Kappa Coefficient of Naive Bayes with 62% overall accuracy is better than Logistic Regression with 98% accuracy?
Question
I have been trying to evaluate my models used on fire systems dataset with a huge imbalance in the dataset. Most models failed to predict any true positives correctly however naive Bayes managed to do that but with a very high rate of False Positive. I had run the experiments on both the confusion matrix and classification report for both can be seen below. The same dataset and train/test split was used with both of the datasets
Naive Bayes Confusion Matrix and Classification Report
[[TN=732 FP=448]
[FN=2 TP=15]]
precision recall f1-score support
0 1.00 0.62 0.76 1180
1 0.03 0.88 0.06 17
accuracy 0.62 1197
macro avg 0.51 0.75 0.41 1197
weighted avg 0.98 0.62 0.75 1197
Logistic Regression Confusion Matrix and Classification Report
[[TN=1180 FP=0]
[FN=17 TP=0]]
precision recall f1-score support
0 0.99 1.00 0.99 1180
1 0.00 0.00 0.00 17
accuracy 0.98 1197
macro avg 0.49 0.50 0.50 1197
weighted avg 0.97 0.99 0.98 1197
However I got the Kohen Kappa Coefficient for these models and I am quite confused on how to interpret the values. Please find values below
Logistic Regression=0.0
Naive Bayes=0.03
These values indicate very slight agreement. But why is the value of Naive Bayes slightly better than Logistic regression ?
Solution
Logistic Regression is only predicting one class (in this case the negative class)! Because of the high imbalance in the data, this model gives a high accuracy score. This metric, however, isn't reliable for imbalanced datasets. A more proper metric like Cohen's Kappa penalizes this behavior.
Naive Bayes, on the other hand, tries to predict both classes. It misses a lot more predictions this way, but it's Kappa is higher.