Python: Is "pass by reference" acceptable in recursive extraction?

Question 1

It may be a little off-topic, but have you considered using generators?

It is as explicit, as possible. You specify that it should yield given node value, and all nodes extracted from this node:

def _recursive_extraction(node):
    for subnode in node.get_subnodes():
        yield subnode

        yield from _recursive_extraction(subnode)

Also, there is advantage of speicfying ALGORITHM for extraction, without specifying container for extracted data.

Last, but not least, it may be a little faster/more memory efficient than first solution - if you want to iterate once over extracted nodes, then that is what you need, if you want to store it, pass this generator to list(). If constructor is faster than extending list (and I assume it may be), you have your speedup, and you don't need intermediate data structures.

Question 2

I would generally use the second case. It is hardly "implicit"; you are explicitly passing the list object to the recursive calls, and correctly using the None default for mutable arguments. All Python call pass "by reference", whether the object is mutable or not.

However, if you would like to stick with version 1, note that you can also keep store flat using extend:

store.extend(_recursive_extraction(subnode))

which I think is more efficient than list addition; extend modifies the list, where + creates a new one.

Question 3

You could also use an iterative approach since Python function calls are a bit slow.

def _iterative_extraction(node):
    stack = [node]
    store = []

    while stack:
        node = stack.pop()

        for subnode in node.get_subnodes():
            store.append(subnode)
            stack.append(subnode)   

    return store

(Same disclaimer, this is not tested.)

Question 4

Using the store like that is halfway to making it non-recursive.

def extraction(node):
    nodes_so_far = []
    to_do = [node]

    while to_do:
        current_node = to_do.pop(0)
        nodes_so_far.append(current_node)
        to_do = current_node.get_subnodes() + to_do  # Assuming get_subnodes returns a list

    return nodes_so_far

But now that I've written it out, I don't think it's better.