Can this recursion be written more efficiently?
-
10-12-2019 - |
Question
I have a hierarchical organization that is a tree where the node is a child if the parent sponsors the child. It seems I can traverse the tree with this code
def get_team(self, person, team):
firstline = User.query(User.sponsor == person.key).fetch(99999999)
if firstline:
for person in firstline:
team.append(person)
newdownline = self.downline(person, team)
return team
Using the above I can get a user's organization just by
downline=user.get_team(user, [])
But is there a more efficient way since I have to do this many times for a single request and that much recursion might be ineffecient? Or will the code be fine since it can traverse the tree correctly? In my first version I used three variables and I found I could rearrange the code to only two variables instead of this:
def downline(self, person, team, teamlist):
firstline = User.query(User.sponsor == person.key).fetch(99999999)
if firstline:
for person in firstline:
teamlist.append(person)
newdownline = self.downline(person, team, teamlist)
team.append(newdownline)
return teamlist
I found that the teamlist variable wasn't really needed so I removed it. The way I did it was first with one variable too many:
people = user.downline(user, [], [])
Solution
Yes, there is a more efficient way to do it, depending on your computational tradeoffs.
You are currently doing a depth-first traversal of your tree, a perfectly fine approach. You could add some speed by caching results at the expense of some RAM usage, so:
if person.id in downline_cache:
team.append(downline_cache[person.id])
else:
downline_cache[person.id] = results
team.append(results)
If the tree is fairly small, you could just cache the whole thing upfront, once per thread. That takes more RAM than what you're doing, but is much faster than doing a depth-first traversal every time you care what the results are. A lot depends on your usage patterns and the amount of data you're storing.
If you use a cache, you must make sure you have some way to deal with the underlying data changing, either with timeouts and 'eventually correct' type guarantees, or keeping track of when and how you must wipe the cache, or elements of the cache.