On which step should use SMOTE technique for over sampling?
-
06-12-2019 - |
Question
I want to use SMOTE technique for over sampling but I don't know on which step on pre-processing I should use it.
My preprocessing steps are:
Missing values
Removing Outliers
Smoothing Data
Should I use SMOTE before all of these steps or its better to use it after these steps?
Solution
If you are using python, you can't use SMOTE in the presence of null values.
In this case:
- Remove Outliers
- Smooth Data
- Impute null values (there are some smart options for that in R: using random forests to impute)
- SMOTE
Removing outliers first let you do better smoothing and imputing.
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange