Computation of Error rate in nearest neighbor classification algorithm

https://stackoverflow.com/questions/19202034

30-06-2022
|

Question

I am trying to find the optimal value of K for K Nearest Neighbor Algorithm. I am been running this classification method on Matlab for different number of classes members but I need to calculate the error rate when we use different value of K. I am trying to use this idea as example:

I have the following data set:

1 3 1

2 3 2

2 1 2

3 3 2

3 4 1

3 3 2

2 2 2

Where the first column is the x axis the second it y axis the third is the label of the class and I need to classify point (x,y) using K-NN algorithm. I am using different values of K. My question is if I know that that point (4,1) is not included in the source dataset but I know that it is from the class label 1. How can I compute the error rate of the certain K value based on method Leave-one-out-cross-validation.

Thank you a lot in advance

Regards

Rinadi

Solution

The leave-one-out cross validation means simply, that given your model m, training set T of size n and some evaluation metric (error measure) E you proceed as follows:

For each point (x,y) from T:
1. You train your model m on T\(x,y) (all points but the one taken in 1.)
2. You check E( m , (x,y) ), for example you check whether m is able to determine y given x correctly (then E=0) or not (and E=1)
You compute the mean of all E values across all points analyzed

As the result you have a mean generalization error estimation - you checked how well your model can predict a label of one point, trained on the rest of the training set.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow