KD Tree alternative/variant for weighted data

https://stackoverflow.com/questions/16485490

21-04-2022
|

Question

I'm using a static KD-Tree for nearest neighbor search in 3D space. However, the client's specifications have now changed so that I'll need a weighted nearest neighbor search instead. For example, in 1D space, I have a point A with weight 5 at 0, and a point B with weight 2 at 4; the search should return A if the query point is from -5 to 5, and should return B if the query point is from 5 to 6. In other words, the higher-weighted point takes precedence within its radius.

Google hasn't been any help - all I get is information on the K-nearest neighbors algorithm.

I can simply remove points that are completely subsumed by a higher-weighted point, but this generally isn't the case (usually a lower-weighted point is only partially subsumed, like in the 1D example above). I could use a range tree to query all points in an NxNxN cube centered on the query point and determine the one with the greatest weight, but the naive implementation of this is wasteful - I'll need to set N to the point with the maximum weight in the entire tree, even though there may not be a point with that weight within the cube, e.g. let's say the point with the maximum weight in the tree is 25, then I'll need to set N to 25 even though the point with the highest weight for any given cube probably has a much lower weight; in the 1D case, if I have a point located at 100 with weight 25 then my naive algorithm would need to set N to 25 even if I'm outside of the point's radius.

To sum up, I'm looking for a way that I can query the KD tree (or some alternative/variant) such that I can quickly determine the highest-weighted point whose radius covers the query point.

FWIW, I'm coding this in Java.

It would also be nice if I could dynamically change a point's weight without incurring too high of a cost - at present this isn't a requirement, but I'm expecting that it may be a requirement down the road.

Edit: I found a paper on a priority range tree, but this doesn't exactly address the same problem in that it doesn't account for higher-priority points having a greater radius.

Solution 2

I think I've found a solution: the nested interval tree, which is an implementation of a 3D interval tree. Rather than storing points with an associated radius that I then need to query, I instead store and query the radii directly. This has the added benefit that each dimension does not need to have the same weight (so that the radius is a rectangular box instead of a cubic box), which is not presently a project requirement but may become one in the future (the client only recently added the "weighted points" requirement, who knows what else he'll come up with).

OTHER TIPS

Use an extra dimension for the weight. A point (x,y,z) with weight w is placed at (N-w,x,y,z), where N is the maximum weight.

Distances in 4D are defined by…

d((a, b, c, d), (e, f, g, h)) = |a - e| + d((b, c, d), (f, g, h))

…where the second d is whatever your 3D distance was.

To find all potential results for (x,y,z), query a ball of radius N about (0,x,y,z).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow