Question

I have a Comparator<Foo> with the following comparison function:

float d = o1.bar - o2.bar;
if (Math.abs(d) <= 0.001) {
    return 0;
} else {
    return d < 0 ? -1 : 1; // inline Math.copySign
}

Essentially, this is supposed to compare two Foos based on their bar property unless the values are close enough, in which case they should be declared equal. (This is important because I'm doing another sort after this, on a different property.)

Obviously, though, this is not a transitive comparator. If there are Foos f1, f2, and f3 with values of bar as 1.999, 2.000, and 2.001, respectively, then, according to my comparator, f1==f2 and f2==f3 but f1 != f3.

Calling sort(myListOfFoo, myFooComparator) gives a "Comparison method violates its general contract!" error very rarely, but deterministically.

How can I use such a comparator with Collections.sort(List, Comparator) without generating this error?


Alternatively, is there some way I can store my data that will allow the comparator to work properly? Roudning each float to the nearest 0.001 at construction would be the simplest solution, except that the Foo.bar field is actually calculated based on an arbitrary distance metric, so it's not that simple.

The actual code is:

float d = metric.distance(vertex, o1)
        - metric.distance(vertex, o2);
if (Math.abs(d) < threshold) {
    return 0;
} else {
    return d < 0 ? -1 : 1; // inline Math.copySign
}

where o1, o2, and vertex are instances of class Point { float x; float y; } and metric is an instance of interface DistanceMetric { float distance(Point p1, Point p2); }. It may be worth noting that this is failing even on the standard Euclidean metric.

Was it helpful?

Solution

I'm afraid that the Java 7 sort implementation will not tolerate a Comparator that exhibits intransitivity. There is nothing you can do about that ... if you want to use the standard Java SE sorting APIs.


But, in fact, using threshold comparison in a sort is actually mathematically incorrect.

The issues when comparing floating point values are that they are often imprecise to start with, and the calculations then typically introduce further small errors into the results. When two results are close enough, the accumulated error may larger than the difference between the values ... meaning that we can't tell whether the ideal numbers (without the errors) are less, equal or greater than each error. We deal with this by treating "close to equals" as being "equal", by comparing using a threshold.

When we are sorting values (i.e. placing them in order), the issue of errors in the values needs to be handled differently. Suppose that

  • we have two numbers v1 ± e1 and v2 ± e2, and

  • when we compare the numbers using threshold comparison, the threshold is greater than mod(e1) + mod(e2)

If it turns out that v1 and v2 are close enough that mod(v1 - v2) < (mod(e1) + mod(e2)) then it doesn't matter if we put v1 ± e1 before or after v2 ± e2 in the ordering. If we observe the ordering of the two numbers (after the sort has completed) by comparing using the threshold, they will show as "equal", which ever order we put them in.

So if we ignored errors and simply ordered the numbers using precise comparisons, we won't put any pair of numbers in an incorrect order as far as we can discern when we use threshold-based comparison.

Now suppose that we have v1 ± e1, v2 ± e2 and v3 ± e3 ... and mod(e1) + mod(e3) is GREATER that our threshold:

  • If we order as above (using precise comparison) we will still end up with the numbers in the right order.

  • If we used "comparison with thresholds" to order the values (and the sort implementation allowed that!), we could end up with the numbers in the order v3 ± e3, v2 ± e2 and v1 ± e1. We have {v1 ± e1, v2 ± e2} and {v2 ± e2, v3 ± e3} are pairwise equal, but we could also have {v1 ± e3, v3 ± e3} incorrectly ordered, even when we compare using threshold-based comparisons!


The bottom line is that you should simply implement your Comparator (for sorting purposes!) to use exact comparisons. Threshold comparison is wrong for this context. This applies irrespective of how the sort algorithm is coded ...

OTHER TIPS

I guess what you actually want to do is remove duplicate values ( by your threshold ) and then sort the rest. Why don't you do a natural sorting first based on non-rounded values, and then use filtering based on your threshold value.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top