Reading 32bit Packed Binary Data On 64bit System

https://stackoverflow.com/questions/135246

02-07-2019
|

Question

I'm attempting to write a Python C extension that reads packed binary data (it is stored as structs of structs) and then parses it out into Python objects. Everything works as expected on a 32 bit machine (the binary files are always written on 32bit architecture), but not on a 64 bit box. Is there a "preferred" way of doing this?

It would be a lot of code to post but as an example:

struct
{
    WORD    version;
    BOOL    upgrade;
    time_t  time1;
            time_t  time2;
} apparms;

File *fp;
fp = fopen(filePath, "r+b");
fread(&apparms, sizeof(apparms), 1, fp);
return Py_BuildValue("{s:i,s:l,s:l}",
  "sysVersion",apparms.version,
  "powerFailTime", apparms.time1,
  "normKitExpDate", apparms.time2
 );

Now on a 32 bit system this works great, but on a 64 bit my time_t sizes are different (32bit vs 64 bit longs).

Damn, you people are fast.

Patrick, I originally started using the struct package but found it just way to slow for my needs. Plus I was looking for an excuse to write a Python Extension.

I know this is a stupid question but what types do I need to watch out for?

Thanks.

Solution

Explicitly specify that your data types (e.g. integers) are 32-bit. Otherwise if you have two integers next to each other when you read them they will be read as one 64-bit integer.

When you are dealing with cross-platform issues, the two main things to watch out for are:

Bitness. If your packed data is written with 32-bit ints, then all of your code must explicitly specify 32-bit ints when reading and writing.
Byte order. If you move your code from Intel chips to PPC or SPARC, your byte order will be wrong. You will have to import your data and then byte-flip it so that it matches up with the current architecture. Otherwise 12 (0x0000000C) will be read as 201326592 (0x0C000000).

Hopefully this helps.

OTHER TIPS

The 'struct' module should be able to do this, although alignment of structs in the middle of the data is always an issue. It's not very hard to get it right, however: find out (once) what boundary the structs-in-structs align to, then pad (manually, with the 'x' specifier) to that boundary. You can doublecheck your padding by comparing struct.calcsize() with your actual data. It's certainly easier than writing a C extension for it.

In order to keep using Py_BuildValue() like that, you have two options. You can determine the size of time_t at compiletime (in terms of fundamental types, so 'an int' or 'a long' or 'an ssize_t') and then use the right format character to Py_BuildValue -- 'i' for an int, 'l' for a long, 'n' for an ssize_t. Or you can use PyInt_FromSsize_t() manually, in which case the compiler does the upcasting for you, and then use the 'O' format characters to pass the result to Py_BuildValue.

You need to make sure you're using architecture independent members for your struct. For instance an int may be 32 bits on one architecture and 64 bits on another. As others have suggested, use the int32_t style types instead. If your struct contains unaligned members, you may need to deal with padding added by the compiler too.

Another common problem with cross architecture data is endianness. Intel i386 architecture is little-endian, but if you're reading on a completely different machine (e.g. an Alpha or Sparc), you'll have to worry about this too.

The Python struct module deals with both these situations, using the prefix passed as part of the format string.

@ - Use native size, endianness and alignment. i= sizeof(int), l= sizeof(long)
= - Use native endianness, but standard sizes and alignment (i=32 bits, l=64 bits)
< - Little-endian standard sizes/alignment
- Big-endian standard sizes/alignment

In general, if the data passes off your machine, you should nail down the endianness and the size / padding format to something specific — ie. use "<" or ">" as your format. If you want to handle this in your C extension, you may need to add some code to handle it.

What's your code for reading the binary data? Make sure you're copying the data into properly-sized types like int32_t instead of just int.

Why aren't you using the struct package?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow