Question

Imagine that you have some counter or other data element that needs to be stored in a field of a binary protocol. The field naturally has some fixed number n of bits and the protocol specifies that you should store the n least significant bits of the counter, so that it wraps around when it is too large. One possible way to implement that is actually taking the modulus by a power of two:

field_value = counter % 2 ** n

But this certainly isn't the most efficient way and maybe not even the easiest to understand, taking into account that the specification is talking about the least significant bits and does not mention a modulus operation. Thus, investigating alternatives is appropriate. Some examples are:

field_value = counter % (1 << n)
field_value = counter & (1 << n) - 1
field_value = counter & ~(-1 << 8)

What is the way preferred by experienced Python programmers to implement such a requirement trying to maximize code clarity without sacrificing too much performance?

There is of course no right or wrong answer to this question, so I would like to use this question to collect all the reasonable implementations of this seemingly trivial requirement. An answer should list the alternatives and shortly describe in what circumstance what alternative would preferably be used.

Was it helpful?

Solution

Bit shifting and bitwise operations are more readable in your case. Because it simply tells the reader, you are doing bitwise operations here. If you use numeric operation, the reader may not be able to understand what does it mean by moduloing that number.

Talking about performance, actually you don't have to worry too much about this in Python. Because operation to Python object itself is expensive enough, by either doing it in numeric operations or bitwise operations, it simply doesn't matter. Here I explain it in a visual way

<-------------- Python object operation cost --------------><- bit op ->
<-------------- Python object operation cost --------------><----- num op ----->

This is just a simple rough idea of what it costs to perform a simplest bit operation or number operation. As you can see Python object operation cost takes the majority, so it doesn't matter you use bitwise or numeric, the difference is too small can be ignored.

If you really need performance, you have to process massive amount of data, you should consider

  1. Write the logic in C/C++ module for Python, you can use library like Boost.Python
  2. Use third party library for mass number processing such as numpy

OTHER TIPS

you should simply throw away the top bits.

#field_value = counter & (1 << n) - 1
field_value = counter & ALLOWED_BIT_WIDTH

If this was implemented in an embedded device, the registers used could be the limiting factor. In my experience this is way it is normally done.

The "limitation" in the protocol is a way of constraining the overhead bandwidth needed by the protocol.

It will be dependent on the python implementation probably, but in CPython 2.6, it looks like this:

In [1]: counter = 0xfedcba9876543210
In [10]: %timeit counter % 2**15
1000000 loops, best of 3: 304 ns per loop

In [11]: %timeit counter % (1<<15)
1000000 loops, best of 3: 302 ns per loop

In [12]: %timeit counter & ((1<<15)-1)
10000000 loops, best of 3: 104 ns per loop

In [13]: %timeit counter & ~(1<<15)
10000000 loops, best of 3: 170 ns per loop

In this case, counter & ((1<<15)-1) is the clear winner. Interesting is that 2**15 and 1<<15 take the same amount of time (more or less); I am guessing Python internally optimizes this case and 2**15 -> 1<<15 anyways.

I once wrote a class that lets you just do this:

 bc = BitSliceLong(counter)
 bc = bc[15:0]

derived from long, but it's a more general implementation (lets you take any range of the bits, not just x:0) and the extra overhead for that makes it slower by an order of magnitude, even though it's using the same method inside.

Edit: BTW, precalculating the values doesn't appear to provide any benefit - the dominant factor here is not the actual math operation. If we do

cx_mask = 2**15
counter % cx_mask

the time is the same as when it had to calculate 2**15. This was also true for our 'best case' - precalculating ((1<<15)-1) has no benefit.

Also, in the previous case, I used a large number that is implemented as a long in python. This is not really a native type - it supports arbitrary length numbers, and so needs to handle nearly anything, so implementing operations is not just a single ALU call - it involves a series of bit-shifting and arithmetic operations.

If you can keep the counter below sys.maxint, you'll be using int types instead, and they both appear to be faster & also more dominated by actual math code:

In [55]: %timeit x % (1<<15)
10000000 loops, best of 3: 53.6 ns per loop

In [56]: %timeit x & ((1<<15)-1)
10000000 loops, best of 3: 49.2 ns per loop

In [57]: %timeit x % (2**15)
10000000 loops, best of 3: 53.9 ns per loop

These are all about the same, so it doesn't matter which one you use here really. (mod slightly slower, but within random variation). It makes sense for div/mod to be an expensive operation on very large numbers, with a more complex algorithm, while for 'small' ints it can be done in hardware.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top