Question

I have following code to calculate the EuclideanDistance distance using weka.core.EuclideanDistance, where both two instances are all missing values, like below

Instance first are all missing values: ?,?,?,?

instance second are all missing values:?,?,?,?

EuclideanDistance distance = new EuclideanDistance();
distance.setInstances(test);
Instance first = test.get(0);
Instance second = test.get(1);
double d = distance.distance(first, second);

however, when i run the code, i got the result is 4.0, i have no idea where is this 4.0 from,can anyone tell me? Thanks in advance!

Was it helpful?

Solution

Missing values in k-Nearest Neighbours algorithm usually are handled according to the following criteria:

For nominal attributes:

if isMissingValue(a) or isMissingValue(b), then
    distance = 1

For numeric attributes:

if isMissingValue(a) and isMissingValue(b), then
    distance = 1

if isMissingValue(a) and !isMissingValue(b), then
    distance = max(b, 1-b)

if !isMissingValue(a) and isMissingValue(b), then
    distance = max(a, 1-a)

You can check the implementation in the source (link provided by Walter).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top