문제

I have following code to calculate the EuclideanDistance distance using weka.core.EuclideanDistance, where both two instances are all missing values, like below

Instance first are all missing values: ?,?,?,?

instance second are all missing values:?,?,?,?

EuclideanDistance distance = new EuclideanDistance();
distance.setInstances(test);
Instance first = test.get(0);
Instance second = test.get(1);
double d = distance.distance(first, second);

however, when i run the code, i got the result is 4.0, i have no idea where is this 4.0 from,can anyone tell me? Thanks in advance!

도움이 되었습니까?

해결책

Missing values in k-Nearest Neighbours algorithm usually are handled according to the following criteria:

For nominal attributes:

if isMissingValue(a) or isMissingValue(b), then
    distance = 1

For numeric attributes:

if isMissingValue(a) and isMissingValue(b), then
    distance = 1

if isMissingValue(a) and !isMissingValue(b), then
    distance = max(b, 1-b)

if !isMissingValue(a) and isMissingValue(b), then
    distance = max(a, 1-a)

You can check the implementation in the source (link provided by Walter).

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top