KD-Tree "median of list" construction

https://stackoverflow.com/questions/13516500

01-12-2021
|

Question

I've coded up a KD-Tree in Java using the "median of list" algorithm for constructing a more balanced tree. It seems to work fine when using the data provided by the wiki, note that the wikipedia example uses only X,Y values, so it doesn't evaluate the Z depth.

From wikipedia:

point_list = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)]

enter image description here

From my java program:

depth=0 id=(7.0, 2.0, 0.0)
├── [left] depth=1 id=(5.0, 4.0, 0.0)
│   ├── [left] depth=2 id=(2.0, 3.0, 0.0)
│   └── [right] depth=2 id=(4.0, 7.0, 0.0)
└── [right] depth=1 id=(9.0, 6.0, 0.0)
    └── [left] depth=2 id=(8.0, 1.0, 0.0)

But when I use the "median of list" approach on this data, it doesn't seem to work properly.

point list = [(1,0,-1), (1,0,-2), (1,0,1), (1,0,2)]

I get a tree like this:

depth=0 id=(1.0, 0.0, 1.0)
├── [left] depth=1 id=(1.0, 0.0, -2.0)
│   └── [left] depth=2 id=(1.0, 0.0, -1.0)
└── [right] depth=1 id=(1.0, 0.0, 2.0)

Which doesn't look correct because (1.0, 0.0, 2.0) is to the right of (1.0, 0.0, 1.0) but they are essentially equal because their Y values are equal. Also, (1.0, 0.0, -1.0) is to the left of (1.0, 0.0, -2.0) and it should be to the right since it's Z value is greater.

I think the problem stems from having equal X and Y values and only variable Z values, so the median of the list doesn't really split the list accurately.

... original code following the wiki's python code ...

private static KdNode createNode(List<XYZPoint> list, int k, int depth) {
    if (list == null || list.size() == 0) return null;

    int axis = depth % k;
    if (axis == X_AXIS) Collections.sort(list, X_COMPARATOR);
    else if (axis == Y_AXIS) Collections.sort(list, Y_COMPARATOR);
    else Collections.sort(list, Z_COMPARATOR);

    KdNode node = null;
    if (list.size() > 0) {
        int mediaIndex = list.size() / 2;
        node = new KdNode(k, depth, list.get(mediaIndex));
        if ((mediaIndex - 1) >= 0) {
            List<XYZPoint> less = list.subList(0, mediaIndex);
            if (less.size() > 0) {
                node.lesser = createNode(less, k, depth + 1);
                node.lesser.parent = node;
            }
        }
        if ((mediaIndex + 1) <= (list.size() - 1)) {
            List<XYZPoint> more = list.subList(mediaIndex + 1, list.size());
            if (more.size() > 0) {
                node.greater = createNode(more, k, depth + 1);
                node.greater.parent = node;
            }
        }
    }

    return node;
}

... new code based on my comment ...

private static KdNode createNode(List<XYZPoint> list, int k, int depth) {
    if (list == null || list.size() == 0) return null;

    int axis = depth % k;
    if (axis == X_AXIS) Collections.sort(list, X_COMPARATOR);
    else if (axis == Y_AXIS) Collections.sort(list, Y_COMPARATOR);
    else Collections.sort(list, Z_COMPARATOR);

    KdNode node = null;
    if (list.size() > 0) {
        int medianIndex = list.size() / 2;
        node = new KdNode(k, depth, list.get(medianIndex));
        List<XYZPoint> less = new ArrayList<XYZPoint>(list.size()-1);
        List<XYZPoint> more = new ArrayList<XYZPoint>(list.size()-1);
        //Process list to see where each non-median point lies
        for (int i=0; i<list.size(); i++) {
            if (i==medianIndex) continue;
            XYZPoint p = list.get(i);
            if (KdNode.compareTo(depth, k, p, node.id)<=0) {
                less.add(p);
            } else {
                more.add(p);
            }
        }
        if (less.size() > 0) {
            node.lesser = createNode(less, k, depth + 1);
            node.lesser.parent = node;
        }
        if (more.size() > 0) {
            node.greater = createNode(more, k, depth + 1);
            node.greater.parent = node;
        }
    }

Solution

The problem indeed has to do with equal coordinates and arises from how you split the nodes into less and more parts. Since you have the median index, why not use the index for splitting instead of checking the coordinates? Just change the condition in createNode on line 116 from

if (KdNode.compareTo(depth, k, p, node.id)<=0) {

if (i<medianIndex) {

Btw: there are more efficient algorithms to partition a list into lower, median, upper than sorting. (lower and upper parts do not need to be sorted! see e.g. the implementation of std::nth_element in the C++ stdlib - sorry, I'm that much into Java programming)

OTHER TIPS

I think the essential question at this point is: what, exactly, do you want to do with the KD-tree?

If you just want to find a closest point using X- and Y-distance only, then the algorithm you have is perfectly fine - you'll find at least one of the four points at equal XY-distance from your example.
If you want to find all closest points in XY-distance, then still keep the KD-tree building function the same, but just change all '<' operators in your lookup function to '<='. If you find a KD-tree point exactly at the query point, you'll still need to descend down an arbitrary child of that tree until you find a leaf. Then go up the tree as usual in a KD-tree, always descending down the sibling tree if it could potentially match the shortest distance you have found so far.
If you want to use distance involving X-, Y-, and Z-coordinates, you need to make your tree a 3-dimensional KD-tree, with X-, Y-, and Z-layers alternating (or potentially with some clever scheme for choosing which dimension to subdivide next).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow