문제

My data looks like this:

enter image description here

birth_date has 634,990 missing values
gender has 328,849 missing values

Both of these are a substantial amounts since I have 900k entries, so I can't discard empty rows. For birth_date someone recommended using Multivariate imputation by Chained equation (MICE). I don't know what predictive model I should use for gender. Of the non-missing data, there are 5x more males than females.

Can someone tell me what would be best practice here? What would be the best way to fill in the missing values for gender ?

I'm using the data to predict bike-ride duration and final destination (I know they're shown on the table above)

올바른 솔루션이 없습니다

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top