Question

I have a dataset containing numerical as well as categorical variables.

After I've fit my dataset to a CatBoostClassifier, I want to extract the entire feature set, with the categorical variables encoded in whatever method the classifier decided to encode them.

How can I extract the fully transformed (encoded) features? (similar to what a fit_transform method would return)

Was it helpful?

Solution

I don't believe this is possible, CatBoost does target encoding per split, so you end up with different values of encoding at different trees.

Before each split is selected in the tree (see Choosing the tree structure), categorical features are transformed to numerical. This is done using various statistics on combinations of categorical features and combinations of categorical and numerical features.

https://catboost.ai/docs/concepts/algorithm-main-stages_cat-to-numberic.html

However, if you just want to use the CatBoost encoding algorithm. You can use CatBoostEncoder

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top