Is this a good practice of feature engineering?
-
31-10-2019 - |
Question
I have a practical question about feature engineering... say I want to predict house prices by using logistic regression and used a bunch of features including zip code. Then by checking the feature importance, I realize zip is a pretty good feature, so I decided to add some more features based on zip - for example, I go to census bureau and get the average income, population, number of schools, and number of hospitals of each zip. With these four new features, I find the model performances better now. So I add even more zip-related features... And this cycle goes on and on. Eventually the model will be dominated by these zip-related features, right?
My questions:
- Does it make sense doing these in the first place?
- If yes, how do I know when is a good time to stop this cycle?
- If not, why not?
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange