Question

I have this 5-5-2 backpropagation neural network I'm training, and after reading this awesome article by LeCun I started to put in practice some of the ideas he suggests.

Currently I'm evaluating it with a 10-fold cross-validation algorithm I made myself, which goes basically like this:

for each epoch      
  for each possible split (training, validation)
    train and validate
  end
  compute mean MSE between all k splits
end

My inputs and outputs are standardized (0-mean, variance 1) and I'm using a tanh activation function. All network algorithms seem to work properly: I used the same implementation to approximate the sin function and it does it pretty good.

Now, the question is as the title implies: should I standardize each train/validation set separately or do I simply need to standardize the whole dataset once?

Note that if I do the latter, the network doesn't produce meaningful predictions, but I prefer having a more "theoretical" answer than just looking at the outputs.

By the way, I implemented it in C, but I'm also comfortable with C++.

Était-ce utile?

La solution

You will most likely be better off standardizing each training set individually. The purpose of cross-validation is to get a sense for how well your algorithm generalizes. When you apply your network to new inputs, the inputs will not be ones that were used to compute your standardization parameters. If you standardize the entire data set at once, you are ignoring the possibility that a new input will fall outside the range of values over which you standardized.

So unless you plan to re-standardize every time you process a new input (which I'm guessing is unlikely), you should only compute the standardization parameters for the training set of the partition being evaluated. Furthermore, you should compute those parameters only on the training set of the partition, not the validation set (i.e., each of the 10-fold partitions will use 90% of the data to calculate standardization parameters).

Autres conseils

So you assume the inputs are normally distribution and are subtracting the mean, dividing by standard deviation, to get N(0,1) distributed inputs?

Yes I agree with @bogatron that you standardize each training set separately, but I would more strongly say it's a "must" to not use the validation set data too. The problem is not values outside the range in the training set; this is fine, the transformation to a standard normal is still defined for any value. You can't compute mean / standard deviation overa ll the data because you can't in any way use the validation data in the training set, even if just via this statistic.

It should further be emphasized that you use the mean from the training set with the validation set, not the mean from the validation set. It has to be the same transformation of features that was used during training. It would not be valid to transform the validation set differently.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top