Question

so I am trying to code up the k nearest neighbor algorithm. The input to my function would be a set of data and a sample to classify. I am just trying to understand the workings of the algorithm. Can you guys tell me if this "pseudocode" of what I am trying to do is correct?

kNN (dataset, sample){

   1. Go through each item in my dataset, and calculate the "distance" from that data item to my specific sample.
   2. Out of those samples I pick the "k" ones that are most close to my sample, maybe in a premade array of "k" items?

}

The part I get confused with is when I say "go through each item in my dataset". Should I be going through each CLASS in my dataset and finding the k-nearest neighbors? Then from there finding which one is closest to my sample, which then tells me the class?

Part 2 question(ish), is using this algorithm but without a sample. How would I calculate the "accuracy" of the data set?

I really am looking for broad word answers rather than specifics, but anything that helps me understand is appreciated. I am implementing this in R.

Thanks

Was it helpful?

Solution

Your pseudocode should change this way:

kNN (dataset, sample){
   1. Go through each item in my dataset, and calculate the "distance" 
   from that data item to my specific sample.
   2. Classify the sample as the majority class between K samples in 
   the dataset having minimum distance to the sample.
}

This pseduocode has been illustrated int the following figure.

enter image description here

Suppose the data set consists of two classes A and B, shown as red and blue respectively, and we want to apply KNN with K=5 for to samples, shown with green and purple stars.
KNN computes the distance of each test sample to all the samples and finds five neighbors, having minimum distances to the test sample, and assign the majority class to the test sample.

Accuracy : 1 - (Number of misclassified test samples / Number of test samples)

For implementation in "R" you may see either this or this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top