Should weights on earlier layers change less than weights on later layers in a neural network
-
22-10-2019 - |
Question
I'm trying to debug why my neural network isn't working. One of the things I've observed is that the weights between the input layer and the first hidden layer hardly change at all, whereas weights later in the network (eg. the weights between the last hidden layer and the output) change significantly. Is this to be expected or a symptom of an error in my code?
I'm applying backpropagation and gradient descent to alter the weights.
Solution
This is expected and well established - Vanished Gradient
https://en.wikipedia.org/wiki/Vanishing_gradient_problem
This has the effect of multiplying n of these small numbers to compute gradients of the "front" layers in an n-layer network, meaning that the gradient (error signal) decreases exponentially with n and the front layers train very slowly.
While I'm not an expert in neural network (experts, please add an answer), I'm sure professional implementation for neural network is tricky and highly optimized. If you simply implement a vanilla text-book-styled neural network, you won't be able to use it to train a large network. Keep it small and you'll be fine.