문제

I have a dataset containing numerical as well as categorical variables.

After I've fit my dataset to a CatBoostClassifier, I want to extract the entire feature set, with the categorical variables encoded in whatever method the classifier decided to encode them.

How can I extract the fully transformed (encoded) features? (similar to what a fit_transform method would return)

도움이 되었습니까?

해결책

I don't believe this is possible, CatBoost does target encoding per split, so you end up with different values of encoding at different trees.

Before each split is selected in the tree (see Choosing the tree structure), categorical features are transformed to numerical. This is done using various statistics on combinations of categorical features and combinations of categorical and numerical features.

https://catboost.ai/docs/concepts/algorithm-main-stages_cat-to-numberic.html

However, if you just want to use the CatBoost encoding algorithm. You can use CatBoostEncoder

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top