Frequency of occurrence - dummy variables
-
02-11-2019 - |
Question
I am thinking about it not the first time, namely if I have a variable that I want to convert later to the variable dummy (cities in this case), should I delete lines that occur less often than N times?
For example, the value of new york has occurred 400+ times but there are cities that only appeared once or twice.
What should I do with values that have appeared only once or twice?
print(df[cities].value_counts())
Output:
city1 424
city2 107
city3 35
city4 33
city5 28
city6 24
city7 15
city8 7
city9 4
city10 3
city11 2
city12 1
city13 1
city14 1
city15 1
city16 1
city17 1
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange