Question

I wonder whether one epoch using mini-batch gradient descent is slower than one epoch using just batch gradient descent.

At least I understand that one iteration of mini-batch gradient descent should be faster than one iteration of batch gradient descent.

However, if I understand it correctly, since the mini-batch gradient descent must update the weights by the number of the batch size in one epoch, the training would be slower than the batch gradient descent, which computes and updates the weights only once in one epoch.

Is this correct? In that case, is it worth worrying about the loss of the overall training time?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top