What is the preferred way to take the modulus by a power of 2 of an int in Python

Question 1

Bit shifting and bitwise operations are more readable in your case. Because it simply tells the reader, you are doing bitwise operations here. If you use numeric operation, the reader may not be able to understand what does it mean by moduloing that number.

Talking about performance, actually you don't have to worry too much about this in Python. Because operation to Python object itself is expensive enough, by either doing it in numeric operations or bitwise operations, it simply doesn't matter. Here I explain it in a visual way

<-------------- Python object operation cost --------------><- bit op ->
<-------------- Python object operation cost --------------><----- num op ----->

This is just a simple rough idea of what it costs to perform a simplest bit operation or number operation. As you can see Python object operation cost takes the majority, so it doesn't matter you use bitwise or numeric, the difference is too small can be ignored.

If you really need performance, you have to process massive amount of data, you should consider

Write the logic in C/C++ module for Python, you can use library like Boost.Python
Use third party library for mass number processing such as numpy

Question 2

you should simply throw away the top bits.

#field_value = counter & (1 << n) - 1
field_value = counter & ALLOWED_BIT_WIDTH

If this was implemented in an embedded device, the registers used could be the limiting factor. In my experience this is way it is normally done.

The "limitation" in the protocol is a way of constraining the overhead bandwidth needed by the protocol.

Question 3

It will be dependent on the python implementation probably, but in CPython 2.6, it looks like this:

In [1]: counter = 0xfedcba9876543210
In [10]: %timeit counter % 2**15
1000000 loops, best of 3: 304 ns per loop

In [11]: %timeit counter % (1<<15)
1000000 loops, best of 3: 302 ns per loop

In [12]: %timeit counter & ((1<<15)-1)
10000000 loops, best of 3: 104 ns per loop

In [13]: %timeit counter & ~(1<<15)
10000000 loops, best of 3: 170 ns per loop

In this case, counter & ((1<<15)-1) is the clear winner. Interesting is that 2**15 and 1<<15 take the same amount of time (more or less); I am guessing Python internally optimizes this case and 2**15 -> 1<<15 anyways.

I once wrote a class that lets you just do this:

 bc = BitSliceLong(counter)
 bc = bc[15:0]

derived from long, but it's a more general implementation (lets you take any range of the bits, not just x:0) and the extra overhead for that makes it slower by an order of magnitude, even though it's using the same method inside.

Edit: BTW, precalculating the values doesn't appear to provide any benefit - the dominant factor here is not the actual math operation. If we do

cx_mask = 2**15
counter % cx_mask

the time is the same as when it had to calculate 2**15. This was also true for our 'best case' - precalculating ((1<<15)-1) has no benefit.

Also, in the previous case, I used a large number that is implemented as a long in python. This is not really a native type - it supports arbitrary length numbers, and so needs to handle nearly anything, so implementing operations is not just a single ALU call - it involves a series of bit-shifting and arithmetic operations.

If you can keep the counter below sys.maxint, you'll be using int types instead, and they both appear to be faster & also more dominated by actual math code:

In [55]: %timeit x % (1<<15)
10000000 loops, best of 3: 53.6 ns per loop

In [56]: %timeit x & ((1<<15)-1)
10000000 loops, best of 3: 49.2 ns per loop

In [57]: %timeit x % (2**15)
10000000 loops, best of 3: 53.9 ns per loop

These are all about the same, so it doesn't matter which one you use here really. (mod slightly slower, but within random variation). It makes sense for div/mod to be an expensive operation on very large numbers, with a more complex algorithm, while for 'small' ints it can be done in hardware.