Encoding categorical data with pre-determined dictionary
-
13-12-2020 - |
質問
in case feature encoding, if I'd like to encode my values based on my pre-determined dictionary, how do I do that?
For instance, say, I've values as [Red, Green, and Blue]
and I want to encode them as [-1,0,1]
-1 for red, 0 for Green, 1 for Blue... I'll apply it to my feature. I believe I can do it by mapping, apply method, not sure. But is there any better way to do that?
Column expectedEncoding
Red -1
Red -1
Blue 1
Green 0
Red -1
Blue 1
```
解決
Assuming you have a pandas DataFrame and one mapping per column, with all mappings stored in a 2-level dict where the keys of the first level correspond to the columns in the dataframe and the keys of the second level correspond to the categories:
{'fruit': {'banana': -1, 'apple': 1}, 'color': {'yellow': -1, 'red': 1}}
Then, you can do the following:
encoded_data = data.apply(lambda col: col.map(mappings[col.name]))
[EDIT] if have columns for which you don't have a mapping, you can do one of the following:
data.update(data[list(mappings)].apply(lambda col: col.map(mappings[col.name])))
or if you want it in a new dataframe (eg to keep the dataframe with the original values):
encoded_data = data.copy()
encoded_data.update(data[list(mappings)].apply(lambda col: col.map(mappings[col.name])))