Question

I am using a VPTree to optimize some K-Nearest neighbors algorithms.

A VPTree requires that a distance function satisfy the triangle inequality.

The triangle inequality states that the following must be true:

distance(x,z) <= distance(x,y) + distance(y,z)

One of the features used in our distance function is geographic distance, in meters, which is calculated with floating point arithmetic. I found that this feature has been violating the triangle inequality due to inexact floating point calculations.

For Example:

x = -90,-180
y = -90,-162
z = -81,-144
distance(x,z) = 1005162.6564502382
distance(x,y) = 1.2219041408558764E-10
distance(y,z) = 1005162.656450238
distance(x,y) + distance(y,z) = 1005162.6564502381

Obviously the triangle inequality has failed in this case.

I was messing around and found that if I round the distance in meters DOWN to the next integer, ie Math.floor() in java, and then add 5, the result seems to all of a sudden satisfy the triangle inequality in all cases I have tested.

I have tested every lat/long combination that is a multiple of 10, ie 20^6 combinations.

After this change we get the following results for the example above:

x = -90,-180
y = -90,-162
z = -81,-144
distance(x,z) = 1005167
distance(x,y) = 5
distance(y,z) = 1005167
distance(x,y) + distance(y,z) = 1005172

Obviously the triangle inequality no longer fails in this case.

This seems perfect since 5 meters really isn't significant in our use case.

Am I just "forcing" this to work and am still violating some requirement of the triangle inequality or some requirement of VPTrees? Is this something that is known property of floats?

Note that simply rounding DOWN without adding 5 does not work.

For Example:

x = -90,-180
y = -81,-180
z = -72,-180
distance(x,z) = 2009836.0
distance(x,y) = 1005162.0
distance(y,z) = 1004673.0
distance(x,y) + distance(y,z) = 2009835.0

And adding 5:

x = -90,-180
y = -81,-180
z = -72,-180
distance(x,z) = 2009841.0
distance(x,y) = 1005167.0
distance(y,z) = 1004678.0
distance(x,y) + distance(y,z) = 2009845.0

Also note that I have found that this works for any kind of floating point arithmetic, not just geo distance. For example a distance function that calculates a percentage of some maximum value with a single division operation, as long as you always round to a specific number of digits and add 5 to the last digit.

Was it helpful?

Solution

Old thread, but I can’t resist answering.

You method of rounding down and adding five may satisfy the triangle inequality, but it will make the distance between a point and itself greater than zero. That means that:

Distance(x, x) == 5

IMO this is probably as problematic for the VPTree than failing on the triangle inequality property.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top