...
import cPickle
rf = RandomForestRegresor()
rf.fit(X, y)
with open('path/to/file', 'wb') as f:
cPickle.dump(rf, f)
# in your prediction file
with open('path/to/file', 'rb') as f:
rf = cPickle.load(f)
preds = rf.predict(new_X)
Save python random forest model to file
-
19-09-2022 - |
Вопрос
In R, after running "random forest" model, I can use save.image("***.RData")
to store the model. Afterwards, I can just load the model to do predictions directly.
Can you do a similar thing in python? I separate the Model and Prediction into two files. And in Model file:
rf= RandomForestRegressor(n_estimators=250, max_features=9,compute_importances=True)
fit= rf.fit(Predx, Predy)
I tried to return rf
or fit
, but still can't load the model in the prediction file.
Can you separate the model and prediction using the sklearn random forest package?
Решение
Другие советы
You can use joblib
to save and load the Random Forest from scikit-learn (in fact, any model from scikit-learn)
The example:
import joblib
from sklearn.ensemble import RandomForestClassifier
# create RF
rf = RandomForestClassifier()
# fit on some data
rf.fit(X, y)
# save
joblib.dump(rf, "my_random_forest.joblib")
# load
loaded_rf = joblib.load("my_random_forest.joblib")
What is more, the joblib.dump
has compress
argument, so the model can be compressed. I made very simple test on iris dataset and compress=3
reduces the size of the file about 5.6 times.
I use dill, it stores all the data and I think possibly module information? Maybe not. I remember trying to use pickle
for storing these really complicated objects and it didn't work for me. cPickle
probably does the same job as dill
but i've never tried cpickle
. it looks like it works in literally the exact same way. I use "obj" extension but that's by no means conventional...It just made sense for me since I was storing an object.
import dill
wd = "/whatever/you/want/your/working/directory/to/be/"
rf= RandomForestRegressor(n_estimators=250, max_features=9,compute_importances=True)
rf.fit(Predx, Predy)
dill.dump(rf, open(wd + "filename.obj","wb"))
btw, not sure if you use iPython, but sometimes writing a file that way doesn't so you have to do the:
with open(wd + "filename.obj","wb") as f:
dill.dump(rf,f)
call the objects again:
model = dill.load(open(wd + "filename.obj","rb"))
for the model storing you can also use .sav formate. it stores complete model and information.
I'd reiterate that joblib does the job well and it provides really good compression options (ie lzma).
with open("clf.pkl", "wb") as out: pickle.dump(clf, out)
with open("clf.dill", "wb") as out: dill.dump(clf, out)
joblib.dump(clf, "clf.jbl")
joblib.dump(clf, "clf.jbl.lzma")
joblib.dump(clf, "clf.jbl.gz")
!du clf.*
24576 clf.dill
24576 clf.jbl
5120 clf.jbl.gz
3072 clf.jbl.lzma
24576 clf.pkl