Question

I have to extract nodes from a tree structure, in order to have a flat list of all the nodes in the tree. I have thought of two options

The clean, readable but maybe not so efficient, since there is a lot of list merges:

def _recursive_extraction(node):
    store = []

    # .get_subnodes() returns only the direct children
    for subnode in node.get_subnodes():
        store.append(subnode)

        subsubnodes = _recursive_extraction(subnode)
        store += subsubnodes

    return store

And the (probably) more efficient one, but less readable. My boss tried to convince me to use it but I doubt. It passes inside a mutable object that gets implicitly/hiddenly updated, as "pass by reference" in other languages.

def _recursive_extraction(node, store=None):
    if store is None:
        store = []

    for subnode in node.get_subnodes():
        store.append(subnode)

        # Watch out! store is updated in this func call
        _recursive_extraction(subnode, store)        

    return store

Disclaimer: the code has not been tested, it just expresses my ideas.

Disclaimer 2: by "by reference" I mean passing a mutable object that can be updated.

Side edit: It is much more efficient to append the subnodes in bulk out of the loop, something like this:

def _recursive_extraction(node, store=None):
    if store is None:
        store = []

    subnodes = node.get_subnodes()
    store += subnodes

    for subnode in subnodes:

        # Watch out! store is updated in this func call
        _recursive_extraction(subnode, store)        

    return store
Was it helpful?

Solution

It may be a little off-topic, but have you considered using generators?

It is as explicit, as possible. You specify that it should yield given node value, and all nodes extracted from this node:

def _recursive_extraction(node):
    for subnode in node.get_subnodes():
        yield subnode

        yield from _recursive_extraction(subnode)

Also, there is advantage of speicfying ALGORITHM for extraction, without specifying container for extracted data.

Last, but not least, it may be a little faster/more memory efficient than first solution - if you want to iterate once over extracted nodes, then that is what you need, if you want to store it, pass this generator to list(). If constructor is faster than extending list (and I assume it may be), you have your speedup, and you don't need intermediate data structures.

OTHER TIPS

I would generally use the second case. It is hardly "implicit"; you are explicitly passing the list object to the recursive calls, and correctly using the None default for mutable arguments. All Python call pass "by reference", whether the object is mutable or not.

However, if you would like to stick with version 1, note that you can also keep store flat using extend:

store.extend(_recursive_extraction(subnode))

which I think is more efficient than list addition; extend modifies the list, where + creates a new one.

You could also use an iterative approach since Python function calls are a bit slow.

def _iterative_extraction(node):
    stack = [node]
    store = []

    while stack:
        node = stack.pop()

        for subnode in node.get_subnodes():
            store.append(subnode)
            stack.append(subnode)   

    return store

(Same disclaimer, this is not tested.)

Using the store like that is halfway to making it non-recursive.

def extraction(node):
    nodes_so_far = []
    to_do = [node]

    while to_do:
        current_node = to_do.pop(0)
        nodes_so_far.append(current_node)
        to_do = current_node.get_subnodes() + to_do  # Assuming get_subnodes returns a list

    return nodes_so_far

But now that I've written it out, I don't think it's better.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top