Question

I prototyped an ML model consisting of preprocessing + multiple stacked regressors. I would like a colleague of mine to develop an API that will query the model. Is there any way to query the model (sklearn pipeline) without having to download all the dependencies (XGBoost, LGBM, CatBoost, ...).

I tried to serialize it with Joblib but when we deserialize it on another machine it requires to have dependencies installed.

The goal is really to transform the sklearn's pipeline to a complete inert black box that requires minimal setup. Is it possible?

Was it helpful?

Solution

You may be able to convert the entire model pipeline to a standardized format. PMML is such a format, and there are tools (e.g. jpmml) to convert all your named modeling package objects to PMML, though perhaps you've used something else that isn't already easily-converted.

Otherwise, just force installation of dependencies (and make it easy), through a virtual environment or docker image or ...

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top