MNIST and SGDClassifier classifer

Question

First remark: you are using SGDClassifier with the default parameters: they are likely not the optimal values for this dataset: try other values as well (especially for alpha, the regularization parameter).

Now to answer your question it's quite unlikely that a linear model will do very good on a dataset like MNIST which is digit image classification task. You might want to try linear models such as:

SVC(kernel='rbf') (but not scalable, try on a small subset of the training set) and not incremental / out-of-core
ExtraTreesClassifier(n_estimator=100) or more but not out-of-core either. The larger the number of sub estimators, the longer it will take to train.

You can also try the Nystroem approximation of SVC(kernel='rbf') by transforming the dataset using a Nystroem(n_components=1000, gamma=0.05) fitted on a small subset of the data (e.g. 10000 samples) and then passing the whole transformed training set to a linear model such as SGDClassifier: it requires 2 passes over the dataset.

There is also a pull request for 1 hidden layer perceptron on github that should be both faster to to compute than ExtraTreesClassifier and approach 98% test set accuracy on MNIST (and also provide a partial_fit API for out-of-core learning).

Edit: the fluctuation of the estimate of the SGDClassifier score is expected: SGD stands for stochastic gradient descent, which means that examples are considered one at a time: badly classified samples can cause an update of the weights of the model in a way that is detrimental for other samples, you need to do more than one pass over the data to make the learning rate decrease enough to get a smoother estimate of the validation accuracy. You can use itertools.repeat in your for loop to do several passes (e.g. 10) over your dataset.