문제

I am using the xgboost library. My system runs a cronjob each night, where it pulls the data from the database and trains the model. However, I would like to remove the re-training of the model again and again, and just fine-tune it with any new data that came in the database. In sklearn's implemantation (http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html) one could use warm_start option, what about xgboost ?

도움이 되었습니까?

해결책

I see that in the current version of python wrapper of xgboost you can specify file name or existing xgboost model (class Booster) in train function.

다른 팁

It should be noted that XGBoost makes optimal splits assuming it has access to the entire dataset, so you would likely be losing out on some predictive power by updating rather than retraining (though of course this may be worth the lessened computational cost).

See an interesting discussion on this XGBoost github issue

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top