Question

I'm working on an imbalanced class data set (200 samples) with 2 classes, first class has 50 sample and second has 150 sample.

My questions:

  1. When I use SMOTE technique on my data set my total dataset samples will be greater than 200 or smaller?
  2. Is there standard parameters value about SMOTE technique or not?
Était-ce utile?

La solution

First of all, if all your variables are numerical then you can you SMOTE; otherwise you should use SMOTENC.

Answers: 1- Your classes are 150-50, then SMOTE gives you 150-150. So, YES; your total dataset sample will be 300.

2- You can use the default parameters; it works well in most of the problems. However, you can play with random_state and k_neighbors. All parameters have been explained here: imblearn.over_sampling.SMOTE

Licencié sous: CC-BY-SA avec attribution
scroll top