Question

As mentioned before, I have a classification problem and unbalanced data set. The majority class contains 88% of all samples. I have trained a Generalized Boosted Regression model using gbm() from the gbm package in R and get the following output:

  interaction.depth  n.trees  Accuracy  Kappa  Accuracy SD  Kappa SD
  1                  50       0.906     0.523  0.00978      0.0512  
  1                  100      0.91      0.561  0.0108       0.0517  
  1                  150      0.91      0.572  0.0104       0.0492  
  2                  50       0.908     0.569  0.0106       0.0484  
  2                  100      0.91      0.582  0.00965      0.0443  
  2                  150      0.91      0.584  0.00976      0.0437  
  3                  50       0.909     0.578  0.00996      0.0469  
  3                  100      0.91      0.583  0.00975      0.0447  
  3                  150      0.911     0.586  0.00962      0.0443  

Looking at the 90% accuracy I assume that model has labeled all the samples as majority class. That's clear. And what is not transparent: how Kappa is calculated.

  • What does this Kappa values (near to 60%) really mean? Is it enough to say that the model is not classifying them just by chance?
  • What do Accuracy SD and Kappa SD mean?
Was it helpful?

Solution

The Kappa is Cohen's Kappa score for inter-rater agreement. It's a commonly-used metric for evaluating the performance of machine learning algorithms and human annotaters, particularly when dealing with text/linguistics.

What it does is compare the level of agreement between the output of the (human or algorithmic) annotater and the ground truth labels, to the level of agreement that would occur through random chance. There's a very good overview of how to calculate Kappa and use it to evaluate a classifier in this stats.stackexchange.com answer here, and a more in-depth explanation of Kappa and how to interpret it in this paper, entitled "Understanding Interobserver Agreement: The Kappa Statistic" by Viera & Garrett (2005).

The benefit of using Kappa, particularly in an unbalanced data set like yours, is that with a 90-10% imbalance between the classes, you can achieve 90% accuracy by simply labeling all of the data points with the label of the more commonly occurring class. The Kappa statistic is describing how well the classifier performs above that baseline level of performance.

Kappa ranges from -1 to 1, with 0 indicating no agreement between the raters, 1 indicating a perfect agreement, and negative numbers indicating systematic disagreement. While interpretation is somewhat arbitrary (and very task-dependent), Landis & Koch (1977) defined the following interpretation system which can work as a general rule of thumb:

Kappa Agreement
< 0 Less than chance agreement
0.01–0.20 Slight agreement
0.21– 0.40 Fair agreement
0.41–0.60 Moderate agreement
0.61–0.80 Substantial agreement
0.81–0.99 Almost perfect agreement

Which would indicate that your algorithm is performing moderately well. Accuracy SD and Kappa SD are the respective Standard Deviations of the Accuracy and Kappa scores. I hope this is helpful!

OTHER TIPS

This may provide some answer: http://cran.r-project.org/web/packages/caret/vignettes/caret.pdf

You may also check out Max Kuhn's "Applied Predictive Modeling" book. He talks about the caret package at length in this book, including the kappa statistics and how to use it. This may be of some help to you.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top