Question

I want to use SMOTE technique for over sampling but I don't know on which step on pre-processing I should use it.

My preprocessing steps are:

  • Missing values

  • Removing Outliers

  • Smoothing Data

Should I use SMOTE before all of these steps or its better to use it after these steps?

Was it helpful?

Solution

If you are using python, you can't use SMOTE in the presence of null values.

In this case:

  1. Remove Outliers
  2. Smooth Data
  3. Impute null values (there are some smart options for that in R: using random forests to impute)
  4. SMOTE

Removing outliers first let you do better smoothing and imputing.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top