質問

I created a classification model with three target classes and created a confusion matrix to measure the accuracy, here is the matrix code

from sklearn.datasets import load_wine
data = load_wine()
x = pd.DataFrame(data=data.data, columns=data.feature_names )
y = pd.DataFrame(data=data.target, columns=['target'])
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2,shuffle=True,random_state = 42)
model = DecisionTreeClassifier()

model.fit(x_train, y_train)

y_pred = model.predict(x_test)

mat=confusion_matrix(y_test, y_pred, labels=[0,1,2])   

The output of the above matrix is

array([[13,  1,  0],
       [ 0, 14,  0],
       [ 1,  0,  7]])

So, this is a balanced dataset and giving me an accuracy of 94% almost.

The problem is, when I tried to draw ROC AUC curve for class 0 using the below code, the AUC curve is the opposite and I am getting only 0.05 area under the curve.

fpr,tpr, thres =  roc_curve(y_test, y_pred, pos_label=0)
roc_auc = auc(fpr,tpr)

plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

I tried to calculate and draw ROC AUC manually and I got TPR as 0.9285 and FPR as 0.04545 and the graph is perfect on the sheet, could you please help me why the above code is giving the graph the other way. I verified the above code for the other two classes and the graph is good. Thanks in advance.

役に立ちましたか?

解決

The ROC curve is built by taking different decision thresholds, and should be built using the predict_proba of your estimator. In particular, in your multiclass example, the ROC is using the values 0,1,2 as a rank-ordering! So there are four thresholds, the one between 0 and 1 being the most important here: there, you declare all of the samples the model predicts as being class 0 to be negative, and all others as positive, giving you the complementary point in the ROC-space.

Instead, you should have

fpr,tpr, thres =  roc_curve(y_test, model.predict_proba(x_test)[:,0], pos_label=0)
# or 
fpr,tpr, thres =  roc_curve(y_test==0, model.predict_proba(x_test)[:,0])
ライセンス: CC-BY-SA帰属
所属していません datascience.stackexchange
scroll top