Question

In the SMOTE paper, the authors present the logic of creating synthetic examples when all features are nominal (section 6.2, SMOTE-N):

To generate new minority class feature vectors, we can create new set feature values by taking the majority vote of the feature vector in consideration and its k nearest neighbors

Along with this example:

Let F1 = A B C D E be the feature vector under consideration and let its 2 nearest neighbors be

F2 = A F C G N

F3 = H B C D N

The application of SMOTE-N would create the following feature vector: FS = A B C D N

How would FS be chosen in the case that F3 = H B C I N? How does Value Difference Metric by Cost and Salzberg described in the paper assist in this case?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top