Вопрос

Say that I've clustered a training dataset of 5 classes containing 1000 instances, to 5 clusters (centers) using for example k-means. Then I've constructed a confusion matrix by validating on a test dataset. I want then to use plot a ROC curve from this, how is it possible to do that ?

Это было полезно?

Решение

Roc Curves show trade-off between True Positive and False Positive Rate. In other words

ROC graphs are two-dimensional graphs in which TP rate is plotted on the Y axis and FP rate is plotted on the X axis ROC Graphs: Notes and Practical Considerations for Researchers

When you use a discrete classifier, that classifier produces only a single point in ROC Space. Normally you need a classifier which produces probabilities. You change your parameters in classifier so that your TP and FP rates change. After that you use this points to draw a ROC curve.

Lets say you use k-means. K-means give you cluster membership discretely. A point belongs to ClusterA or .. ClusterE. Therefore outputting ROC curve from k-means is not straightforward. Lee and Fujita describes an algorithm for this. You should look to their paper. But algorithm is something like this.

  1. Apply k-means
  2. calculate TP and FP using test data.
  3. change membership of data points from one cluster to second cluster.
  4. calculate TP and FP using test data again.

As you see they get more points in ROC space and use these points to draw ROC curve

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top