The nearestNeighbors
parameter says how many nearest neighbor instances (surrounding the currently considered instance) are used to build an inbetween synthetic instance. The default value is 5. Thus the attributes of 5 nearest neighbors of a real existing instance are used to compute a new synthetic one.
The percentage
parameter says how many synthetic instances are created based on the number of the class with less instances (by default - you can also use the majority class by setting the -C
option). The default value is 100. This means if you have 25 instances in your minority class, again 25 instances are created synthetically from these (using their nearest neighbours' values). With 200% 50 synthetic instances are created and so on.
For further information also refer to the weka doc of SMOTE and the original paper of Chawla et al. 2002, where the whole method is explained in depth.
For me it appeared that the Weka SMOTE alone only oversamples the instances. So additionally you can use the supervised SpreadSubsample filter to undersample the minority class instances afterwards.