Question

I need to train a model but the data after preprocessing is too large to fit in the RAM. Is it possible to preprocess the data in batches and train a model for the small data set and then save the weights and then train it again for the next batches with the saved weights?

the workflow would be this way:

preprocess -> train -> save weights -> preprocess next batch -> train with saved weights ...

If yes, how can I do it?

Was it helpful?

Solution

As you said:

Is it possible to preprocess the data in batches and train a model for the small data set

Yes! In practice, training of Neural Networks always happens with batches. You never fit the whole dataset at once in the model, whatever medium sized dataset could crash any machine.

This is how it works: You extract a slice of your dataframe (the number is usually in the range of 32-256 observations, but it's very task-specific), you train the model on that, and you move to the next until all dataset is completed. This is what an epoch is, in the Deep Learnig jargon.

... and then save the weights and then train it again for the next batches with the saved weights?

There's no need to save and reload the model at each iteration, just save it once it was trained for the number of epochs of your choice. If the following day you want to keep training, just reload it and keep iterating one batch at a time.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top