문제

I have a dataset as following: enter image description here

This is test case 1. My goal is to fill the missing years data. As the age sex and smoking is not changing so I have to predict the condition and percent data for year 0 to all the way 54. I found high correlation between condition and percent variable. This seems easy. But I am a bit confused now. Should I have to use multivariable regression? what would be the most best method to approach?

도움이 되었습니까?

해결책

Best approach would be to perform data preparation first:

  • Remove features (columns) with no variance in it (you could use: sklearn feature_selection)
  • one-hot-encoding of categorical features
  • insert a lag column of -t steps

If you have more than one explanatory variable, the process is called multiple linear regression. Instead of using a regression model you could also use other learners like XGBoost or LSTMs

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top