mini batch vs. batch gradient descent

https://datascience.stackexchange.com/questions/73656

11-12-2020
|

Question

In batch gradient descent, it is said that one iteration of gradient descent update takes the processing of whole entire dataset, which I believe makes an epoch.On the other hand, in mini batch algorithm an update is made after every mini batch and once every mini batch is done, one epoch is completed. So in both cases, an epoch is completed after all the data is processed.I do not quite get what makes mini batch algorithm more efficient.

Thanks,

Solution

In short, batch gradient descent is accurate but plays it safe, and therefore is slow. Mini-batch gradient descent is a bit less accurate, but doesn't play it safe and is much faster.

When you do gradient descent, you use an estimate of the gradient to update your weights. When you use batch gradient descent, your gradient estimate is 100% accurate since it uses all your data.

Mini-batch is considered more efficient because you might be able to get, let's say, an ~80% accurate gradient with only 5% of the data (these numbers are made up). So, your weights may not always be updated optimally (if your estimate is not so good), but you will be able to update your weights more often since you don't need to go through all your data at once.

The idea is that you update your weights more often with an approximation of your gradient, which often is good enough. The utility of mini-batch becomes more obvious when you start dealing with very large datasets.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange