Encog AI Framework: Backpropagation with Gaussian Noise Injection

Question 1

I think the best way to do this is going to be to create a class that implements the MLDataSet interface. You would then provide a regular BasicMLDataSet (or other data set) to your new version of MLDataSet. For the size() method you would return the number of training patterns that you want trained per iteration. Then for each call to the to your new MLDataSet to return a MLDataPair you randomly select a pair from the provided data set, you would then clone this element, and add the noise as described, and return it.

Does that sounds like it would accomplish what the paper is describing? If you end up implementing this and would like to contribute it to the Encog project, that would be great. I might attempt it myself as well, if you do not.

Question 2

I studied the Backprop class and came up with another way that seemed more general and direct. I created an ErrorStructure interface, a NoisyBackpropagation class and a NoisyGradientWorker.

The first class generalizes many techniques of noise injection (a theme that got renewed attention last month, from I what I saw in some scientific papers).

The second class is just the Backprop with a ErrorStructure property. The third class is a GradientWorker that receives a NoiseStructure as parameter and injects noise in the training process.

Tricky bits:

Some ErrorStructure implementations (in particular one that I hope to publish) will require complex initialization parameters derived from the training set. I couldn't use the more general kind of MLDataSet. This is not strictly necessary, but I'd like to use it someday in the future.
I had to add the SSJ: Stochastic Simulation in Java package as a dependency or NoiseStructure computing would take too long. I don't know how much of Encog's speed will be hampered by SSJ.

By the way, the MLDataSet adjustement solution seems very useful to some advanced re-sampling schemes.