First remark: you are using SGDClassifier
with the default parameters: they are likely not the optimal values for this dataset: try other values as well (especially for alpha, the regularization parameter).
Now to answer your question it's quite unlikely that a linear model will do very good on a dataset like MNIST which is digit image classification task. You might want to try linear models such as:
SVC(kernel='rbf')
(but not scalable, try on a small subset of the training set) and not incremental / out-of-coreExtraTreesClassifier(n_estimator=100)
or more but not out-of-core either. The larger the number of sub estimators, the longer it will take to train.
You can also try the Nystroem approximation of SVC(kernel='rbf')
by transforming the dataset using a Nystroem(n_components=1000, gamma=0.05)
fitted on a small subset of the data (e.g. 10000 samples) and then passing the whole transformed training set to a linear model such as SGDClassifier
: it requires 2 passes over the dataset.
There is also a pull request for 1 hidden layer perceptron on github that should be both faster to to compute than ExtraTreesClassifier
and approach 98% test set accuracy on MNIST (and also provide a partial_fit API for out-of-core learning).
Edit: the fluctuation of the estimate of the SGDClassifier
score is expected: SGD stands for stochastic gradient descent, which means that examples are considered one at a time: badly classified samples can cause an update of the weights of the model in a way that is detrimental for other samples, you need to do more than one pass over the data to make the learning rate decrease enough to get a smoother estimate of the validation accuracy. You can use itertools.repeat in your for loop to do several passes (e.g. 10) over your dataset.