order of features importance after make_column_transformer and pipeline

https://datascience.stackexchange.com/questions/67870

08-12-2020
|

Pregunta

I have a data preparation and model fitting pipeline that takes a dataframe (X_trn) and uses the ‘make_column_transformer’ and ‘Pipeline’ functions in sklearn to prepare the data and fit XGBRegressor. The code looks something like this

 xgb = XGBRegressor()

 preprocessor = make_column_transformer(
        ( Fun1(),List1),
        ( Fun2(),List2),
            remainder='passthrough',
    )

    model_pipeline = Pipeline([
        ('preprocessing', preprocessor),
        ('classifier', xgb )
    ])

   model_pipeline.fit(X_trn, Y_trn)

Therefore, the training data which inputted into the XGBRegressor have no labels and resorted due to the make_column_transformer function. Given this, how do I extract the features importance using XGBRegressor.get_booster().get_score() method?

Currently, the output of get_score() is a dictinary that looks like this: {‘f0’: 123 , ‘f10’: 222, ‘f100’: 334, ‘f101’: 34, … ‘f99’:12}

Can I assume that the order of the features provided by get_score() is identical to the order of features after make_column_transformer function (aka, I have to incorporate the feature sorting) such that 'f0' == 1st feature after make_column_transformer, 'f1' ==2nd feature after make_column_transformer, etc.?

Solución

Your assumption is correct. Usually after column Transformation columns lose their names and get default values corresponding to their orders.

Additional Info: You may try Eli5

from eli5 import show_weights,show_prediction
show_weights(model)
show_prediction(model,data_point)

The later function shows the impact of every features for predicting a data_point.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a datascience.stackexchange