Question

I'm trying to classify a data set containing two classes using different classifiers (LDA, SVM, KNN) and would like to compare their performance. I've made ROC curves for the LDA by modifying the priori probability.

But how can i do the same for a KNN classifier?

I searched the documentation and found some functions:

  1. Class = knnclassify(Sample, Training, Group, k)
  2. mdl = ClassificationKNN.fit(X,Y,'NumNeighbors',i,'leaveout','On')

I can run (a) and get a confusion matrix by using leave-one-out cross-validation but it is not possible to change the priori probability to make an ROC?

I haven't tried (b) before but this creates a model where you can modify the mdl.Prior. But i have no clue how to get a confusion matrix.

Is there an option i've missed or someone who can explain how to fully use those function to get a ROC?

Was it helpful?

Solution

This is indeed not straightforward, because the output of the k-nn classifier is not a score from which a decision is derived by thresholding, but only a decision based on the majority vote.

My suggestion: define a score based on the ratio of classes in the neighborhood, and then threshold this score to compute the ROC. Loosely speaking, the score expresses how certain the algorithm; it ranges from -1 (maximum certainty for class -1) to +1 (maximum certainty for class +1).

Example: for k=6, the score is

  • 1 if all six neighbours are of class +1;
  • -1 if all six neighbours are of class -1;
  • 0 if halve the neighbours are of class +1 and halve the neigbours are of class -1.

Once you have computed this score for each datapoint, you can feed it into a standard ROC function.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top