scikit-learn's decision trees cast their input to float32
for efficiency, but your values won't fit in that type:
>>> np.float32(8.9932064170227995e+41)
inf
The solution is to standardize prior to fitting a model with sklearn.preprocessing.StandardScaler
. Don't forget to transform
prior to predicting. You can use a sklearn.pipeline.Pipeline
to combine standardization and classification in a single object:
rf = Pipeline([("scale", StandardScaler()),
("rf", RandomForestClassifier(n_estimators=100, n_jobs=-1, verbose=2))])
Or, with the current dev version/next release:
rf = make_pipeline(StandardScaler(),
RandomForestClassifier(n_estimators=100, n_jobs=-1, verbose=2))
(I admit the error message could be improved.)