Question

I need to track a set of perhaps 10 million numbers in Python. (All numbers are between 0 and 2^32). I'll know before hand the max val of an integer, and, between 0 and max, between 20-80% of the values will be in the set.

My current code uses the built in set. This way too slow. As far as performance is considered, the best way to do this is with a bitarray (such as https://pypi.python.org/pypi/bitarray/ ).

It's easy for me to use a bitarray to build a class with add(n) and remove(n) methods. What I don't know how to do is support for n in bitarray_set:. I think I need to use an iterator or iterable, but I'm not sure how to do that. Is this possible? How?

Était-ce utile?

La solution

bitarrays support an itersearch method that iterates over all positions where one bitarray occurs in another. Use that:

def __iter__(self):
    return self.bits.itersearch(bitarray([True]))
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top