Question

Short version: Given two k-d trees which contain similar but not identical sets of 2D points, and which might not have the same root, can you exploit the fact they're both trees to match the points in one to the nearest points in the other with fewer lookup steps than if you looked up nodes one by one?

We have a public API that sends geometry (lat-long coords in WKT format, can be either points or polygons) to client apps, they make modifications and send us back the modified WKT. Lately we've been having a problem where a client app is projecting and unprojecting all points before saving, which causes all the points to drift a tiny amount due to floating point math precision limits, even for points the user didn't intend to move. What the client is supposed to be doing is retaining a copy of the original values unmodified and only updating the points the user actually moved.

Obviously this needs to be fixed in the client app. But I'd like to detect when this is happening in the future so that we can "catch" clients that are doing this. It can only be a heuristic, but if we see 90% of the points in are moved just a tiny amount, we could log a warning. If the points are moved more than a tiny amount, we can assume the user meant to move the point.

A complicating factor is that the client is free to serialize the points or polygon vertices back to us in a different order than how we sent it - yet it may still represent the same shape and be valid. Furthermore the user might split polygons, and points or vertices might be deleted or added to the data. If it were just polygons, we could rotate the lists of vertices until they "match up", but given that the polygons might be edited, and that we also have just as much data that is just points and is not polygons, I think it's a simplifying assumption to just treat all of the data as sets of points for the purpose of validation.

One algorithm I came up with to do this is to put one of the sets of points into a k-d tree for fast lookup, and then look up the nearest neighbor of each point in the second set. But I could put them both into k-d trees, and I was wondering if there's a fast algorithm for comparing two k-d trees?

Was it helpful?

Solution

Whatever your distance tolerance is for declaring two points to be the same (call it D), for a given point in the first set (x,y) you can hash and store the integer parts of (x/D,y/D) and store a pointer to the point in the hash table, and then for each point (x,y) in the second set, you can hash the integer parts of (x/D,y/D) along with all neighbors (adding either 0 or +/-1 to each value) and if you find hashed points from the first set then you compare the point in the second set to all points you find from the first set to see if some point in the first set is within distance D of the particular point in the second set. This should run in essentially linear time if your two sets of points each have no pairs of points being within distance D of each other.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top