I implemented dropout and got a loss plot like this enter image description here

Before implementing regularisation the loss the was not noisy at all

I understand why implementing dropout would increase the noise as different neurons would be active in successive epochs, but does L2 regularisation add noise? Why or why not

有帮助吗?

解决方案

I think your best bet to answer that question is to continue to test it on other datasets to confirm that your observation holds true in multiple situations.

Here is my intuition (although I'm by no means an expert).

The regular loss function is (wx + b - y)^2 The loss function for L2 Regularization is L2 = (wx + b -y)^2 + lamdba * w^2

In the first equation, you only have to worry about w as one term. As a simple example, If you update it by .1 and it improves, you'd be safe to update it again in the next round by .1

In the L2 equation, you have an additional instance of w to deal with and it's squared and multiplied by another number. It would be easy to imagine that the same circumstance may lead you to jump the gap and skip over where the gradient is equal to zero.

许可以下: CC-BY-SA归因
scroll top