Why does mixing types in Python struct.pack uses more space than needed?

StackOverflow https://stackoverflow.com/questions/21332956

  •  02-10-2022
  •  | 
  •  

سؤال

I have just tried using struct.pack in Python for the first time, and I don't understand its behaviour when I am mixing types

When I am trying to pack a single char and nothing else, it works as expected, i.e.

struct.pack("b",1)

gives '\x01'. But as soon as I try to mix in data of a different type, the char is padded to be as long as this type, e.g.

struct.pack("bi",1,1)

gives '\x01\x00\x00\x00\x01\x00\x00\x00'.

Is this standard behaviour, and why? Is there a way around it?

Edit

More simply put:

>>> struct.calcsize("b")
1
>>> struct.calcsize("i")
4
>>> struct.calcsize("bi")
8
هل كانت مفيدة؟

المحلول

struct.pack is usually used to access memory structures, not files. In memory, accessing data which occupies several bytes at an odd/unaligned address can cause exceptions or performance loss.

That's why compilers align the data (usually on a 4 or 8 byte boundary) and the struct module in Python does the same.

To disable this, you can use the first character of the format string to set the byte order and alignment. In your case, try struct.pack("=bi",1,1)

If you don't specify anything, then an implicit @ which means "native byte order, size and alignment". See the documentation for other options.

نصائح أخرى

Yes, it is.

By default, C types are represented in the machine’s native format and byte order, and properly aligned by skipping pad bytes if necessary (according to the rules used by the C compiler).

If you don't want alignment, just specify a byte order by starting your format string with '=', '<', or '>' (same as '!').

From the manual:

By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct.

i is a 4-byte integer which will be placed on its own word. As such, anything next to it, that doesn’t fill a word, will be padded to do that. You can override this behavior by specifying a byte order without native alignment.

That’s why—with more complex structs—the ordering of the things inside matters a lot.

See also the Wikipedia article on the topic.

See the documentation for struct; in particular it says

By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct.

And see e.g. this Stack Overflow question for C struct memory layout: C struct memory layout?

In short, the integer is 4 bytes and therefore it must start at a multiple of 4. If you change the order of b and i around, the problem should't arise.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top