In HMAX, you use all the data at the input image layer. And garbor filter, max-pooling, radial basis kernel functions on all of them. Only at C2 layer, you start to train a subset of the images (mostly with a linear kernel based SVM). The subset is set to training data. And the rest are test data. In one word, training images are first used to build the SVM and then the test images are assigned to digit classes using the majority-voting method.
But this is in fact equivalent as you put the training images at the image layer at first. After all the layers going through, you then put the test images at the image layer to restart for the recognition. Since both training and test image need scaling, and all the operations at previous layers prior to C2 are the same, you just mix them altogether at the beginning.
Although you use all the training and test images at the image layer, you still need to shuffle the data and pick up some of them as the training, and the others as the testing.