What is cohen kappa metric, implementation in Python?

https://datascience.stackexchange.com//questions/64314

06-12-2019
|

Question

Can somebody explain indetail explanation on Quadratic Kappa Metric/cohen kappa metric with implementation in Python

Solution

Quadratic Kappa Metric is the same as cohen kappa metric in Sci-kit learn @ sklearn.metrics.cohen_kappa_score when weights are set to 'Quadratic'.

quadratic weighted kappa, which measures the agreement between two ratings. This metric typically varies from 0 (random agreement between raters) to 1 (complete agreement between raters). In the event that there is less agreement between the raters than expected by chance, the metric may go below 0. The quadratic weighted kappa is calculated between the scores which are expected/known and the predicted scores.

Results have 5 possible ratings, 0,1,2,3,4. The quadratic weighted kappa is calculated as follows. First, an N x N histogram matrix O is constructed, such that Oi,j corresponds to the number of adoption records that have a rating of i (actual) and received a predicted rating j. An N-by-N matrix of weights, w, is calculated based on the difference between actual and predicted rating scores.

An N-by-N histogram matrix of expected ratings, E, is calculated, assuming that there is no correlation between rating scores. This is calculated as the outer product between the actual rating's histogram vector of ratings and the predicted rating's histogram vector of ratings, normalized such that E and O have the same sum.

From these three matrices, the quadratic weighted kappa is calculated.

Code implementation in Python

Breaking down the formula into parts

5 step breakdown for Weighted Kappa Metric

First, create a multi-class confusion matrix O between predicted and actual ratings.
Second, construct a weight matrix w which calculates the weight between the actual and predicted ratings.
Third, calculate value_counts() for each rating in preds and actuals.
Fourth, calculate E, which is the outer product of two value_count vectors
Fifth, normalize the E and O matrix

Calculate weighted kappa as per formula

Each Step Explained

Step-1: Under Step-1, we shall be calculating a confusion_matrix between the Predicted and Actual values. Here is a great resource to know more about confusion_matrix.

Step-2: Under Step-2, under step-2 each element is weighted. Predictions that are further away from actuals are marked harshly than predictions that are closer to actuals. We will have a less score if our prediction is 5 and actual is 3 as compared to a prediction of 4 in the same case.

Step-3: We create two vectors, one for preds and one for actuals, which tells us how many values of each rating exist in both vectors.

Step-4: E is the Expected Matrix which is the outer product of the two vectors calculated in step-3.

Step-5: Normalise both matrices to have the same sum. Since it is easiest to get the sum to be '1', we will simply divide each matrix by its sum to normalize the data.

Step-6: Calculated numerator and denominator of Weighted Kappa and return the Weighted Kappa metric as 1-(num/den)

More Info

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange