Computing weights in batch gradient descent

https://datascience.stackexchange.com/questions/16070

16-10-2019
|

Question

I have a reasonably large set of images that I want to classify using a neural network. I can't fil them all into memory at once, so I decided to process them in batches of 200. I'm using an cross-entropy cost function with a minimization algorithm from numpy.

My question is: is it correct to pass learned weights between batches and use them as a starting point of the minimization? Would this eventually cause my hypothesis to fit all the data, or will each iteration simply re-fit the weights for itself? What is the general approach to such a problem?

Solution

What you need to use is mini-batch gradient descent or stochastic gradient descent. You will need to shuffle your samples and make draws from it of the batch size you are aiming for, also you will have to make sure all of the samples in your data are included which will constitute of one epoch. Train for a few epochs depending on the data. Here is a good blog post on mini-batch blog post.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange