Question

I am training a svm classifier with cross validation (stratifiedKfold) using the scikits interfaces. For each test set (of k), I get a classification result. I want to have a confusion matrix with all the results. Scikits has a confusion matrix interface: sklearn.metrics.confusion_matrix(y_true, y_pred) My question is how should I accumulate the y_true and y_pred values. They are arrays (numpy). Should I define the size of the arrays based on my k-fold parameter? And for each result I should add the y_true and y-pred to the array ????

Was it helpful?

Solution

You can either use an aggregate confusion matrix or compute one for each CV partition and compute the mean and the standard deviation (or standard error) for each component in the matrix as a measure of the variability.

For the classification report, the code would need to be modified to accept 2 dimensional inputs so as to pass the predictions for each CV partitions and then compute the mean scores and std deviation for each class.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top