Using MinMaxScaler on Training Set… Do I need to scale the input for a prediction as well?

https://datascience.stackexchange.com/questions/86798

17-12-2020
|

문제

I know this is a rookie question, but I'm having trouble with getting predictions out of a model. I use a MinMaxScaler() function on the training set as seen below...

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Now that I have trained and saved the model, I'm trying to run the model using this code...

filename='cb1.sav'
loaded_model = pickle.load(open(filename, 'rb'))
predicted_y = loaded_model.predict_proba(sys.argv[1:])

print(predicted_y)

I pass all the values for the prediction in as arguments like...

#python3 ./test.py 43 28 0.2348 0.00002934 8

Resulting in an array in sys.argv where sys.argv[1:] represents all the X values in correct order and skipping the first since the first is the name of the file (test.py)

Question: Do I need to perform any scaling or transformation on the sys.argv[1:0] values?

해결책

Yes. Any preprocessing that you did manually before the model object was trained needs to be applied to prediction data; the model is expecting the inputs in the same format as when you trained it.

The easiest way to do that, if you used sklearn preprocessors, is to also save all the preprocessing steps and load those in your prediction script. This is made even easier by putting them together with your model into a Pipeline and just fitting and saving the entire pipeline as a single object.

다른 팁

Yes, the scaling has to be performed at inference time as well.

Ben Reiniger has an excellent suggestion in terms of using a Pipeline to host the scaler and the model as a convenient single artifact for deployment.

An alternate can also be to use the BatchNormalization layer as one of your model's layers, and that will also ensure that you end up with a single artifact to be deployed.

Additional example here.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange