Question

I am using the xgboost library. My system runs a cronjob each night, where it pulls the data from the database and trains the model. However, I would like to remove the re-training of the model again and again, and just fine-tune it with any new data that came in the database. In sklearn's implemantation (http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html) one could use warm_start option, what about xgboost ?

Was it helpful?

Solution

I see that in the current version of python wrapper of xgboost you can specify file name or existing xgboost model (class Booster) in train function.

OTHER TIPS

It should be noted that XGBoost makes optimal splits assuming it has access to the entire dataset, so you would likely be losing out on some predictive power by updating rather than retraining (though of course this may be worth the lessened computational cost).

See an interesting discussion on this XGBoost github issue

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top