How is the target variable passed to the final estimator in this pipeline?

https://datascience.stackexchange.com/questions/86536

17-12-2020
|

Pregunta

There is a pipeline like below. X is features and y is the target variable. I would like to know how y is passed to the estimator, LinearSVC. As far as I know, StandardScalerreturns only transformed X. So, I thought that y was not passed to LinearSVC. However, this code worked and I could make prediction. Thus, I would like to know how y was reached to the final estimator.

from sklearn.preprocessing import StandardScaler 
from sklearn.pipeline import Pipeline
from sklearn.svm import Linear SVC
svm_clf = Pipeline([
            ("scaler",StandardScaler()),
            ("linear_svc",LinearSVC(C=1,loss="hinge")),
          ]) 
svm_clf.fit(X,y)

Solución

The Pipeline object performs a .fit_transform(X, y) sequentially based on how you structure your pipeline until just before the last step (the estimator). The estimator does not perform a transform because it's purpose is not to transform your input array X, but to make a prediction. The fit_transform method performs a fit, then a transform on that pipeline step when it is called. In your case that means:

Your standard scalar is fit based on X and y
Your standard scalar transforms your input data X
Your LinearSVC is fit based on your previously transformed X and your un-transformed target y

Since your LinearSVC is the last step of the pipeline object and doesn't have a transform method the process stops. Your Pipeline object that contains your StandardScalar and the LinearSVC now contains a fit version of both these objects. Now you can perform a .predict() method with this pipeline which will cause the fit StandardScalar to perform a .transform() on input X, then this transformed X will be passed to the final estimator LinearSVC for a .predict() which will return your predicted y array.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a datascience.stackexchange