What predictive model to use to impute Gender?
-
01-11-2019 - |
Question
My data looks like this:
birth_date
has 634,990 missing values
gender
has 328,849 missing values
Both of these are a substantial amounts since I have 900k entries, so I can't discard empty rows. For birth_date
someone recommended using Multivariate imputation by Chained equation (MICE). I don't know what predictive model I should use for gender
. Of the non-missing data, there are 5x more males than females.
Can someone tell me what would be best practice here? What would be the best way to fill in the missing values for gender
?
I'm using the data to predict bike-ride duration and final destination (I know they're shown on the table above)
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange