Is my bayes classification right or meaningful?
-
11-12-2020 - |
質問
I have this dataset and I am learning about Bayes Classifier. After data cleaning, I have tried to use bayes classifier on it. I used R with this code:
library(klaR)
trainingChoco <- chocolateApriori[1:707,]
testChoco <- chocolateApriori[708:884,]
naivebayesChocolate <- NaiveBayes(RatFactor ~ ., data=trainingChoco)
predictionChoco <- predict(naivebayesChocolate, testChoco)
predictionChoco$posterior
library(caret)
tableChocolate <- table(predictionChoco$class, testChoco$RatFactor)
confusionMatrix(tableChocolate)
Where RatFactor only has 1, 2, 3, 4, 5 as values (I rounded the ratings). This is my result:
predictionChoco$posterior 1 2 3 4 5
1434 5.987980e-14 2.619121e-05 0.080559640 0.919414168 5.987980e-11
1435 4.205489e-10 8.759363e-03 0.022926453 0.968314183 4.205489e-10
1436 4.205489e-10 8.759363e-03 0.022926453 0.968314183 4.205489e-10
1439 1.004950e-10 4.709587e-03 0.006445339 0.988845074 1.004950e-10
1442 9.257687e-10 3.133371e-04 0.004947920 0.994738741 9.257687e-10
1443 3.260598e-13 8.227926e-03 0.171718690 0.820053058 3.260598e-07
[...]
confusionMatrix(tableChocolate)
Confusion Matrix and Statistics
1 2 3 4 5
1 0 0 0 0 0
2 0 2 5 7 0
3 0 7 42 28 0
4 0 5 45 36 0
5 0 0 0 0 0
Overall Statistics
Accuracy : 0.452
95% CI : (0.3772, 0.5284)
No Information Rate : 0.5198
P-Value [Acc > NIR] : 0.97
Kappa : 0.0431
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
Sensitivity NA 0.1429 0.4565 0.5070 NA
Specificity 1 0.9264 0.5882 0.5283 1
Pos Pred Value NA 0.1429 0.5455 0.4186 NA
Neg Pred Value NA 0.9264 0.5000 0.6154 NA
Prevalence 0 0.0791 0.5198 0.4011 0
Detection Rate 0 0.0113 0.2373 0.2034 0
Detection Prevalence 0 0.0791 0.4350 0.4859 0
Balanced Accuracy NA 0.5346 0.5224 0.5177 NA
Do you think this result is right? Or am I missing something? Can you explain what would you understand by looking at this result please? Or can you post a similar example done step-by-step? Thanks.
解決
Do you think this result is right?
Depends what you mean by "right"... the results seem reasonable, I don't see any obvious sign of mistake in the process.
Can you explain what would you understand by looking at this result please?
I observe that you don't have any data for classes 1 and 5, so technically it's a 3-classes problem.
- First with 3 classes the random baseline accuracy would be 0.33, 0.45 is better so your model does better than this (that's the bare minimum).
- However according to the confusion matrix class 3 has 92 instances out of a total of 172, which means that a basic majority class learner always predicting class 3 would get 52% accuracy (if my calculation is correct). So 45% is not very good.