Question

I'm working on an imbalanced class data set (200 samples) with 2 classes, first class has 50 sample and second has 150 sample.

My questions:

  1. When I use SMOTE technique on my data set my total dataset samples will be greater than 200 or smaller?
  2. Is there standard parameters value about SMOTE technique or not?
Was it helpful?

Solution

First of all, if all your variables are numerical then you can you SMOTE; otherwise you should use SMOTENC.

Answers: 1- Your classes are 150-50, then SMOTE gives you 150-150. So, YES; your total dataset sample will be 300.

2- You can use the default parameters; it works well in most of the problems. However, you can play with random_state and k_neighbors. All parameters have been explained here: imblearn.over_sampling.SMOTE

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top