Predicting time series data
-
13-12-2020 - |
题
I have a dataset as following:
This is test case 1. My goal is to fill the missing years data. As the age sex and smoking is not changing so I have to predict the condition and percent data for year 0 to all the way 54. I found high correlation between condition and percent variable. This seems easy. But I am a bit confused now. Should I have to use multivariable regression? what would be the most best method to approach?
解决方案
Best approach would be to perform data preparation first:
- Remove features (columns) with no variance in it (you could use: sklearn feature_selection)
- one-hot-encoding of categorical features
- insert a lag column of -t steps
If you have more than one explanatory variable, the process is called multiple linear regression. Instead of using a regression model you could also use other learners like XGBoost or LSTMs