how to adjusting already built ML predictive model

https://datascience.stackexchange.com/questions/77149

12-12-2020
|

Question

How can I continue machine learning model after predicting results?

What I mean by that is that I built a model for my 1 million records dataset, this model took around 1 day to get built.

I extracted the model results using Python and now I have a (function) that I can feed it with my features and it gives me a prediction results

but with time my dataset has become 1.5 million records.

I do not want to redo the whole thing all over again from scratch.

Is there any way I continue of top of thf first model I built ( the one with 1 million records) so the new model take less time to adjust it based on the new 0.5 million records compare to re building everything from scratch for 1.5 million records.

P.S. I am asking for all algorithms, if there is anyway to do this for any algorithm that would be good to know which ones are these

Solution

This depends on your model type:

Classical using ensemble/stacked models:
If you are using classical machine learning, you could use your old model built on the previous 1 million records, and create a new model on the most recent 500k records and then combine the predictions in an ensemble or stacked approach.

References for Ensemble and Stacking:
https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/

Video Reference: https://www.youtube.com/watch?v=Un9zObFjBH0

AI/NN using transfer learning:
If your are using a NN (neural network) model, you can use the idea of transfer learning. Save your model built on the first 1 million records, then add it as an initial layer to a new NN for analyzing the new data. You can then save the new NN and use it in the next round.

Reference: https://machinelearningmastery.com/transfer-learning-for-deep-learning/

Video Reference: https://www.youtube.com/watch?v=yofjFQddwHE

General guidelines:
If you need to do this updating process many times, you can create a new model on n number of records, drop the oldest data/model off once your new dataset reaches a minimum, and predict only on the last x number of models. n and x are adjusted based on your data, flexibility and need for real-time predictions. If the data is changing over time, then it would be better to only use the latest data, or weight the older data lower and the newer data higher.

Here is a good definition of transfer learning: "Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task."

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange