Question

I am looking for a datastructure to handle cycling through a large number of ordered subroutines, some of which are active, most of which are not.

I am thinking I need an implementation of a python set like object which remains ordered when modified, can be easily iterated over (I don't care if an actual for loop works, as long as there is a simple and efficient way to read each element in order). And most notably can be modified while being iterated over.

This is to allow an implementation of a very specific form of subroutine program (unfortunately I cannot use the built in subroutine modules as my python program is emulating the behaviour of a legacy system precisely so I need finer control.)

Effectively I have a list of subroutines which are referenced by an index number in a dictionary, the subroutines are iterated over in ascending order until the last is reached at which point we return to the lowest number one. However there are a large number of subroutines and only a few at any time will be 'active' and ones which are not active are skipped in the order for running. I considered simply giving each routine an active flag which would be tested before running however the enormous number of potential routines compared to the tiny number of active ones makes this method much to slow. Subroutines are only occasionally activated or deactivated but running them must be very fast. As such I have been looking to keep a list of the keys of the active ones and iterate through that, adding or removing them as they are activated or deactivated but am looking for a reasonably efficient way of maintaining this list.

The main two systems which seem feasible to me are blist sorted sets, however I am not sure if they support being modified while iterated over, and using two heapq and shifting the items between them in order, this will definitely support modification at any time but would require a lot more manual work, and would probably be slower.

However somebody may have a much better answer than either of those so I will appreciate any ideas on how to achieve this with reasonable efficiency and ideally in a 'pythonic' way, though I can deal with it being a little ugly if it does the job.

Was it helpful?

Solution

There's a new Python module called sortedcontainers that would meet your needs. It's designed as a nearly drop-in replacement for blist but is pure-Python and faster in most cases. The built-in iteration support is extremely fast but doesn't allow modifying elements. Instead, I would use the built-in indexing which is also fast.

Something like this:

from sortedcontainers import SortedSet
ss = SortedSet(...)
pos = 0
while pos < len(ss):
    item = ss[pos]

    # ... do something with item ...
    # ... possibly insert or delete items ...

    pos += 1

You'll need to decide what happens if you insert an item such that pos += 1 winds up referencing the same item on successive iterations.

You could also queue the insertions/deletions until after iterating the sorted set.

Disclaimer: I am the author of the sortedcontainers module.

OTHER TIPS

I'd use one heapq, give the subroutines a sort key, and simply add active routines to the heap. The inactive subroutines don't need to be in a heap; they can be in a dictionary, as you are not going to loop over those in order.

Licensed under: CC-BY-SA with attribution
scroll top