Question

I am optimizing some loss function using Gradient Descent method. I am trying it with different learning rates, but the objective function's value is converging to same exact point.

Does this means that I am getting stuck in a local minima?, because the loss function is non-convex so it is less likely that I would converge to a global minima.

Was it helpful?

Solution

This is the expected behavior. Different learning rates should converge to the same minimum if you are starting at the same location.

If you're optimizing a neural network and you want to explore the loss surface, randomize the starting parameters. If you always start your optimization algorithm from the same initial value, you will reach the same local extremum unless you really increase the step size and overshoot.

OTHER TIPS

As you said, you are being stuck at a local minima mostly. Change the parameters as suggested above and try. A learning rate that is too large can hinder convergence and cause the loss function to fluctuate around the minimum or even to diverge actually.

To help with starting point and to be specific to quadratic and cross-entropy cost, according to Micheal A.Nielson, "neural Networks and Deep Learning,Determination Press,2015,

enter image description here

This might not work as suggested. Randomizing is a good try. This is a good read.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top