How to graph whether overfitting takes place in a multiclass classifier

Question 1

It seems like loss_ expects an array of shape n_samples, k, whereas staged_predict returns an array of shape [n_samples] (as per the documentation). You probably want to pass in the result of staged_predict_proba or staged_decision_function into loss_.

I think you measure the loss at both train and test sets like so:

for i, pred in enumerate(clf.staged_decision_function(X_test)):
    test_score[i] = clf.loss_(y_test, pred)

for i, pred in enumerate(clf.staged_decision_function(X_train)):
    train_score[i] = clf.loss_(y_train, pred)

plot(test_score)
plot(train_score)
legend(['test score', 'train score'])

Note the second time I call loss_ I passed in the train set. The output looks like what I would expect:

enter image description here

Question 2

You could use something like this. This is an example using knn.

# Setup arrays to store train and test accuracies
neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))

# Loop over different values of k
for i, k in enumerate(neighbors):
  # Setup a k-NN Classifier with k neighbors: knn
  knn = KNeighborsClassifier(n_neighbors=k)

  # Fit the classifier to the training data
  knn.fit(X_train, y_train)

  #Compute accuracy on the training set
  train_accuracy[i] = knn.score(X_train, y_train)

  #Compute accuracy on the testing set
  test_accuracy[i] = knn.score(X_test, y_test)

# Generate plot
plt.title('k-NN: Varying Number of Neighbors')
plt.plot(neighbors, test_accuracy, label = 'Testing Accuracy')
plt.plot(neighbors, train_accuracy, label = 'Training Accuracy')
plt.legend()
plt.xlabel('Number of Neighbors')
plt.ylabel('Accuracy')
plt.show()