I prototyped an ML model consisting of preprocessing + multiple stacked regressors. I would like a colleague of mine to develop an API that will query the model. Is there any way to query the model (sklearn pipeline) without having to download all the dependencies (XGBoost, LGBM, CatBoost, ...).

I tried to serialize it with Joblib but when we deserialize it on another machine it requires to have dependencies installed.

The goal is really to transform the sklearn's pipeline to a complete inert black box that requires minimal setup. Is it possible?

有帮助吗?

解决方案

You may be able to convert the entire model pipeline to a standardized format. PMML is such a format, and there are tools (e.g. jpmml) to convert all your named modeling package objects to PMML, though perhaps you've used something else that isn't already easily-converted.

Otherwise, just force installation of dependencies (and make it easy), through a virtual environment or docker image or ...

许可以下: CC-BY-SA归因
scroll top