How is the target variable passed to the final estimator in this pipeline?
-
17-12-2020 - |
Pregunta
There is a pipeline like below. X
is features and y
is the target variable.
I would like to know how y
is passed to the estimator, LinearSVC
. As far as I know, StandardScaler
returns only transformed X
. So, I thought that y
was not passed to LinearSVC
. However, this code worked and I could make prediction. Thus, I would like to know how y
was reached to the final estimator.
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import Linear SVC
svm_clf = Pipeline([
("scaler",StandardScaler()),
("linear_svc",LinearSVC(C=1,loss="hinge")),
])
svm_clf.fit(X,y)
Solución
The Pipeline object performs a .fit_transform(X, y) sequentially based on how you structure your pipeline until just before the last step (the estimator). The estimator does not perform a transform because it's purpose is not to transform your input array X, but to make a prediction. The fit_transform method performs a fit, then a transform on that pipeline step when it is called. In your case that means:
- Your standard scalar is fit based on X and y
- Your standard scalar transforms your input data X
- Your LinearSVC is fit based on your previously transformed X and your un-transformed target y
Since your LinearSVC is the last step of the pipeline object and doesn't have a transform method the process stops. Your Pipeline object that contains your StandardScalar and the LinearSVC now contains a fit version of both these objects. Now you can perform a .predict() method with this pipeline which will cause the fit StandardScalar to perform a .transform() on input X, then this transformed X will be passed to the final estimator LinearSVC for a .predict() which will return your predicted y array.