Returning a list of values from a tree recursion

https://stackoverflow.com/questions/11952667

26-06-2021
|

Question

I'm trying to teach myself about data structures, and I'm implementing a k-d tree in Python. I have a method to search for points in the tree within a certain radius of one point in my k-d tree class:

def within_radius(self, point, radius, result=[]):
    """
    Find all items in the tree within radius of point
    """
    d = self.discriminator

    if in_circle(point, radius, self.data):
        result.append(self.data)

    # Check whether any of the points in the subtrees could be
    # within the circle
    if point[d] - radius < self.data[d] and self.l_child:
        result.append(self.l_child.within_radius(point, radius, result))

    if point[d] + radius > self.data[d] and self.r_child:
        result.append(self.r_child.within_radius(point, radius, result))

    return result

It works, but the list it returns is pretty funky, with duplicate values from the recursive calls with result. What is a good way of "accumulating" the values returned from a tree recursion into a list? I've been thinking about it for a while but I can't really see how.

Solution

I'm not sure if this is the most clean way to do it, but whenever I do recursion like this, I often add a keyword argument which is the list that is to be returned. That way, when I modify the list, I'm always modifying to the same list:

def recurse(max, _output=None):
    if _output is None:
        _output = []

    #do some work here, add to your list
    _output.append(max)

    if max <= 0: #Some condition where the recursion stops
        return _output
    else:        #recurse with new arguments so that we'll stop someday...
        return recurse(max-1, _output=_output)

This works because when the stop condition is True, the _output list is returned and passed all the way back up the stack.

I use an underscored variable name to indicate that it is meant to be used only within the function itself. This is a slight extension to the normal way underscore-prefixed variables are used (in classes to indicate a variable is "private"), but I think it gets the point across...

Note that this isn't very different from what you have. However, with your version, result will persist between calls since result = [] is evaluated when the function is created, not when it is called. Also, your version is appending the return value (which is the list itself). This gets quite convoluted when you think about the list holding multiple references to itself ...

OTHER TIPS

I agree with mgilson's analysis. list is a mutable type and list.append is in-place. Here's what that means:

There are two types: mutable and immutable.

A mutable type lives in the same location on memory, even when you change it. lists and dicts, for example, are mutable types. This means that if you create a list and change it in certain ways, it will still live in the same location in memory. So suppose you create a list called "myList". Let's say this list in memory location 0x9000. Then, doing myList.append(0) will not change the location of the myList in memory. Even if you did myList[0] = 'a', the location will not change - it will still live at 0x9000.

An immutable type will "move" to a different memory location when you attempt to change it in any way. strs and tuples are immutable. This is why you get the following error:

>>> s = 'as'
>>> s[0] = 'b'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

This means that even if you define s = 'as' (and let's say s now lives at memory address 0x5000), and redefine it as s = 'af', the location of s in memory changes.

Now, when you reassign a mutable type, it's location in memory changes. For example,

L = [1,2,3] # say memory location 0x4000 L = [5,6,7] # memory location not 0x4000 anymore

This is where the property of list.append being "in-place" comes into play. "list.append is in-place" means that the new element is added to the list without creating a new list. This is why list.append has no return value, as demonstrated below:

>>> L = [1,2,3]
>>> ret = L.append(4)
>>> print L
[1, 2, 3, 4]
>>> print ret
None

However, if you wanted to create a new list, you can do that as follows:

>>> L = [1,2,3]
>>> ret = L + [4]
>>> print L
[1, 2, 3]
>>> print ret
[1, 2, 3, 4]

So what is happening in your case is that in both recursive calls (left and right), point is appended to the list in each recursive call. This is why you get duplicate values.

You could circumvent this by doing what mgilson suggests, or if you are a lisp fan (this is a very good lisp question), then you could use the [1,2,3] + [4] principle and do this (untested, but should work):

def within_radius(self, point, radius, result=[]):
    """
    Find all items in the tree within radius of point
    """
    d = self.discriminator

    temp = []

    if in_circle(point, radius, self.data):
        temp = [self.data]

    # Check whether any of the points in the subtrees could be
    # within the circle
    if point[d] - radius < self.data[d] and self.l_child:
        temp += self.l_child.within_radius(point, radius, result)

    if point[d] + radius > self.data[d] and self.r_child:
        temp += self.r_child.within_radius(point, radius, result)

    return result+temp

Hope this helps

Here are a few thoughts:

If you want to return only unique results, you should probably use a set and convert it to a list when you return. The one catch is that self.data has to be immutable, for instance a tuple instead of a list.
Because you're threading result through the recursion and adding appending the results of the recursive calls to it, you're explicitly adding each hit to the result at least twice. Threading the result through the recursion will keep you from creating and throwing away data structures, so you can probably just do that.
As mgilson points out, because of the way Python handles default arguments, setting result to an empty list in the function declaration won't do what you think. Every time you call within_radius without explicitly passing in result, the hits will be accumulated for every call, not just for the individual call. (Did that make sense? See this ). mgilson's answer points this out too.

With all that in mind, I'd probably do something like this:

def within_radius(self, point, radius, result=None):
    d = self.discriminator

    result = set() if result is None else result

    if in_circle(point, radius, self.data):
        result.add(self.data)
    if point[d] - radius < self.data[d] and self.l_child:
        self.l_child.within_radius(point, radius, result)
    if point[d] + radius > self.data[d] and self.r_child:
        self.r_child.within_radius(point, radius, result)

    return list(result)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow