Feature Extraction for MNIST Images Using PCA

https://stackoverflow.com/questions/21640801

08-10-2022
|

Question

I use Matlab to read the MNIST database. Those images are, originally, 28x28 (=784) pixels. So, I have a 2D 784x1000 array (meaning, I have read 1000 images).

Supposing my 2D array's name is IMGS, the Matlab expression: IMGS(:, 1), would give me the first image.

In order to perform PCA, so to extract some of the features of the image (from the 784 of them):

I transpose the array IMGS, putting the images to rows and features (dimensions) to columns, in an array called IMGS_T (IMGS_T(1, :) corresponds to first image).
I use the princomp function like this: [COEFF, SCORES] = princomp(IMGS_T];

My question is this (and it may be a little trivial but, I want to be sure for this): Supposing I want to extract 100 features from the overall of the 784 of them, all I need is the first 100 columns of SCORES?

So, in Matlab terms, all I need is to write: IMGS_PCA = IMGS(:, 100)' and I will have created an 100x1000 array, called IMGS_PCA which will hold my 1000 MNIST images in its columns and the first 100 most important features of them in its rows?

La solution

Basically it's correct. Note that in princomp rows of input correspond to observations, and columns to variables.

To illustrate your procedure,

IMGS = rand(1000,784);
[COEFF, SCORE] = princomp(IMGS);

To prove the use of function is correct, you can try to recover the original image,

recovered_IMGS = SCORE / COEFF + repmat(mean(IMGS,1), 1000, 1);

then IMGS - recovered_IMGS will give you the zero matrix (within numerical error).

To use only the first 100 features, you can just

for i=101:784
    SCORE(:,i) = zeros(1000,1);
end

Then use the same code to recover the images:

recovered_IMGS_100 = SCORE / COEFF + repmat(mean(IMGS,1), 1000, 1);

Or you can, as you mentioned , created another 100 x 1000 array to achieve the same result.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow