Noisy behavior in deep learning feed-forward net

https://datascience.stackexchange.com/questions/11410

16-10-2019
|

Question

I am a bit unsure about optimizing a neural net with 3 or more layers. The input data is quite noisy and I seem to project the noise into the learning (strong bias in the data, 90% belong to one class out of five).

However, I would like to get some feedback on the interpretation (I use 50/25/8/8/8neurons with dropout (keep_prob=0,9 after first hidden layer):

Solution

The overal logarithmic shape of your f1 score graph indicates that learning is effective and cost is heading towards some minimum. That's good. I'm assuming that the noise you're referring to is the instability of the graph after approximately 3k iterations: cost dropping and rising in a zig-zag manner.

This often hints at the learning rate being too large. Back propagation finds the right gradient but you take too big step and end up climbing rather than descending along the edge of the cost function. It's especially evident when a graph seems to oscillate around some middle value. You haven't mentioned what exact value of learning rate you're using but try to reduce it. A good starting point would be 0.01 but it depends on many factors so try to experiment.

Another issue might be a batch size: that is, how many examples contribute to the calculation of a gradient. If it's too large you might end up with an average gradient pointing in the wrong direction. And then even a small step (i.e., low learning rate) won't help. And it might again manifest itself in a zig-zag pattern. If batch size is one of the parameters try to decrease it.

The least likely issue might be the encoding architecture of your network. And especially the modest number of 8 neurons on the last layers. In this case individual neurons might have a considerable impact on the final output. And even little adjustments resulting from a single step of back propagation could potentially flip the sign of that neuron's activation value, impacting the results of other examples. Try increasing the number of neurons on the last layers. I'd personally suggest trying an architecture of 50x50x50.

Hope this helps!

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange