Is my bayes classification right or meaningful?

https://datascience.stackexchange.com/questions/74796

11-12-2020
|

質問

I have this dataset and I am learning about Bayes Classifier. After data cleaning, I have tried to use bayes classifier on it. I used R with this code:

library(klaR)
trainingChoco <- chocolateApriori[1:707,]
testChoco <- chocolateApriori[708:884,]
naivebayesChocolate <- NaiveBayes(RatFactor ~ ., data=trainingChoco)
predictionChoco <- predict(naivebayesChocolate, testChoco)
predictionChoco$posterior
library(caret)
tableChocolate <- table(predictionChoco$class, testChoco$RatFactor)
confusionMatrix(tableChocolate)

Where RatFactor only has 1, 2, 3, 4, 5 as values (I rounded the ratings). This is my result:

predictionChoco$posterior 1 2 3 4 5

1434 5.987980e-14 2.619121e-05 0.080559640 0.919414168 5.987980e-11

1435 4.205489e-10 8.759363e-03 0.022926453 0.968314183 4.205489e-10

1436 4.205489e-10 8.759363e-03 0.022926453 0.968314183 4.205489e-10

1439 1.004950e-10 4.709587e-03 0.006445339 0.988845074 1.004950e-10

1442 9.257687e-10 3.133371e-04 0.004947920 0.994738741 9.257687e-10

1443 3.260598e-13 8.227926e-03 0.171718690 0.820053058 3.260598e-07

[...]

confusionMatrix(tableChocolate)

Confusion Matrix and Statistics

 1  2  3  4  5

1 0 0 0 0 0

2 0 2 5 7 0

3 0 7 42 28 0

4 0 5 45 36 0

5 0 0 0 0 0

Overall Statistics

           Accuracy : 0.452        

             95% CI : (0.3772, 0.5284)

No Information Rate : 0.5198     

P-Value [Acc > NIR] : 0.97            

              Kappa : 0.0431

Mcnemar's Test P-Value : NA

Statistics by Class:

                 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5

Sensitivity NA 0.1429 0.4565 0.5070 NA

Specificity 1 0.9264 0.5882 0.5283 1

Pos Pred Value NA 0.1429 0.5455 0.4186 NA

Neg Pred Value NA 0.9264 0.5000 0.6154 NA

Prevalence 0 0.0791 0.5198 0.4011 0

Detection Rate 0 0.0113 0.2373 0.2034 0

Detection Prevalence 0 0.0791 0.4350 0.4859 0

Balanced Accuracy NA 0.5346 0.5224 0.5177 NA

Do you think this result is right? Or am I missing something? Can you explain what would you understand by looking at this result please? Or can you post a similar example done step-by-step? Thanks.

解決

Do you think this result is right?

Depends what you mean by "right"... the results seem reasonable, I don't see any obvious sign of mistake in the process.

Can you explain what would you understand by looking at this result please?

I observe that you don't have any data for classes 1 and 5, so technically it's a 3-classes problem.

First with 3 classes the random baseline accuracy would be 0.33, 0.45 is better so your model does better than this (that's the bare minimum).
However according to the confusion matrix class 3 has 92 instances out of a total of 172, which means that a basic majority class learner always predicting class 3 would get 52% accuracy (if my calculation is correct). So 45% is not very good.

ライセンス： CC-BY-SA と帰属

所属していません datascience.stackexchange