Confused about definition of a 'median' when constructing a kd-Tree

https://stackoverflow.com/questions/2927165

05-10-2019
|

Question

Im trying to build a kd-tree for searching through a set of points, but am getting confused about the use of 'median' in the wikipedia article. For ease of use, the wikipedia article states the pseudo-code of kd-tree construction as:

function kdtree (list of points pointList, int depth)
{
    if pointList is empty
        return nil;
    else
    {
        // Select axis based on depth so that axis cycles through all valid values
        var int axis := depth mod k;

        // Sort point list and choose median as pivot element
        select median by axis from pointList;

        // Create node and construct subtrees
        var tree_node node;
        node.location := median;
        node.leftChild := kdtree(points in pointList before median, depth+1);
        node.rightChild := kdtree(points in pointList after median, depth+1);
        return node;
    }
}

I'm getting confused about the "select median..." line, simply because I'm not quite sure what is the 'right' way to apply a median here.

As far as I know, the median of an odd-sized (sorted) list of numbers is the middle element (aka, for a list of 5 things, element number 3, or index 2 in a standard zero-based array), and the median of an even-sized array is the sum of the two 'middle' elements divided by two (aka, for a list of 6 things, the median is the sum of elements 3 and 4 - or 2 and 3, if zero-indexed - divided by 2.).

However, surely that definition does not work here as we are working with a distinct set of points? How then does one choose the correct median for an even-sized list of numbers, especially for a length 2 list?

I appreciate any and all help, thanks!

-Stephen

Solution

It appears to me that you understand the meaning of median, but you are confused with something else. What do you mean be distinct set of points?

The code presented by Wikipedia is a recursive function. You have a set of points, so you create a root node and choose a median of the set. Then you call the function recursively - for the left subtree you pass in a parameter with all the points smaller than the split-value (the median) of the original list, for the right subtree you pass in the equal and larger ones. Then for each subtree a node is created where the same thing happens. It goes like this:

First step (root node):
Original set: 1 2 3 4 5 6 7 8 9 10
Split value (median): 5.5

Second step - left subtree:
Set: 1 2 3 4 5
Split value (median): 3

Second step - right subtree:
Set: 6 7 8 9 10
Split value (median): 8

Third step - left subtree of left subtree:
Set: 1 2
Split value (median): 1.5

Third step - right subtree of left subtree:
Set: 3 4 5
Split value (median): 4

Etc.

So the median is chosen for each node in the tree based on the set of numbers (points, data) which go into that subtree. Hope this helps.

OTHER TIPS

You have to choose an axis with as many element on one side than the other. If the number of points is odd or the points are positioned in such a way that it isn't possible, just choose an axis to give an as even repartition as possible.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow