What are the standard parameters values for SMOTE technique?
-
19-10-2020 - |
Question
I'm working on an imbalanced class data set (200 samples) with 2 classes, first class has 50 sample and second has 150 sample.
My questions:
- When I use SMOTE technique on my data set my total dataset samples will be greater than 200 or smaller?
- Is there standard parameters value about SMOTE technique or not?
Solution
First of all, if all your variables are numerical then you can you SMOTE; otherwise you should use SMOTENC.
Answers: 1- Your classes are 150-50, then SMOTE gives you 150-150. So, YES; your total dataset sample will be 300.
2- You can use the default parameters; it works well in most of the problems. However, you can play with random_state and k_neighbors. All parameters have been explained here: imblearn.over_sampling.SMOTE
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange