Can overfitting occur in Advanced Optimization algorithms?

https://datascience.stackexchange.com/questions/13970

16-10-2019
|

Question

while taking an online course on machine learning by Andrew Ng on coursera, I came across a topic called overfitting. I know it can occur when gradient descent is used in linear or logistic regression but can it occur when Advanced Optimization algorithms such as "Conjugate gradient", "BFGS", and "L-BFGS" are used ?

Solution

There is no technique that will eliminate the risk of overfitting entirely. The methods you've listed are all just different ways of fitting a linear model. A linear model will have a global minimum, and that minimum shouldn't change regardless of the flavor of gradient descent that you're using (unless you're using regularization), so all of the methods you've listed would overfit (or underfit) equally.

Moving from linear models to more complex models, like deep learning, you're even more at risk of seeing overfitting. I've had plenty of convoluted neural networks that badly overfit, even though convolution is supposed to reduce the chance of overfitting substantially by sharing weights. In summary, there is no silver bullet for overfitting, regardless of model family or optimization technique.

OTHER TIPS

Overfitting is generally a result of the data and structure of your model. The 'advanced' algorithms you mention have specific uses that may or may not out perform other methods depending on your objectives and your data. Here is a source for some further reading: http://papers.nips.cc/paper/1895-overfitting-in-neural-nets-backpropagation-conjugate-gradient-and-early-stopping.pdf

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange