What predictive model to use to impute Gender?

https://datascience.stackexchange.com/questions/51567

missing-data
predictive-modeling
data-imputation

01-11-2019
|

Question

My data looks like this:

birth_date has 634,990 missing values
gender has 328,849 missing values

Both of these are a substantial amounts since I have 900k entries, so I can't discard empty rows. For birth_date someone recommended using Multivariate imputation by Chained equation (MICE). I don't know what predictive model I should use for gender. Of the non-missing data, there are 5x more males than females.

Can someone tell me what would be best practice here? What would be the best way to fill in the missing values for gender ?

I'm using the data to predict bike-ride duration and final destination (I know they're shown on the table above)

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange