Problem with converting string to dummy variables

https://datascience.stackexchange.com/questions/81863

14-12-2020
|

Question

I'm new in data science, I have data which want to work on it, I omitted extra columns and convert it to 4 columns ( Product, Date, Market, Demand ) . in this data Product and Market are string, I know for working on this data must convert them. I want to convert the string to dummy variables but this isn't logical because I have 64 fruits in the product column.

I am confused and I don't know what can I do whit this strings.

Solution

There are a variety of ways to convert a categorical column to a numeric one, with the right answer many times being use-case specific. Trial and error can help here to see what works best for your problem.

To give a specific recommendation, you may want to try Target Encoding as an option and see how it performs. It will probably be better than One Hot Encoding or Ordinal Encoding in your case.

Example links:

https://contrib.scikit-learn.org/category_encoders/targetencoder.html

https://maxhalford.github.io/blog/target-encoding/

https://brendanhasz.github.io/2019/03/04/target-encoding

OTHER TIPS

You can use label encoders for your product and market. however one hot encoding will be the best one to suit for converting string to numbers to feed into an algorithm . I suggest looking at this link for further details :- https://towardsdatascience.com/choosing-the-right-encoding-method-label-vs-onehot-encoder-a4434493149b

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange