Is there an elegant way to use struct and namedtuple instead of this?

https://stackoverflow.com/questions/11461125

20-06-2021
|

Pergunta

I'm reading a binary file made up of records that in C would look like this:

typedef _rec_t
{
  char text[20];
  unsigned char index[3];
} rec_t;

Now I'm able to parse this into a tuple with 23 distinct values, but would prefer if I could use namedtuple to combine the first 20 bytes into text and the three remaining bytes into index. How can I achieve that? Basically instead of one tuple of 23 values I'd prefer to have two tuples of 20 and 3 values respectively and access these using a "natural name", i.e. by means of namedtuple.

I am currently using the format "20c3B" for struct.unpack_from().

Note: There are many consecutive records in the string when I call parse_text.

My code (stripped down to the relevant parts):

#!/usr/bin/env python
import sys
import os
import struct
from collections import namedtuple

def parse_text(data):
    fmt = "20c3B"
    l = len(data)
    sz = struct.calcsize(fmt)
    num = l/sz
    if not num:
        print "ERROR: no records found."
        return
    print "Size of record %d - number %d" % (sz, num)
    #rec = namedtuple('rec', 'text index')
    empty = struct.unpack_from(fmt, data)
    # Loop through elements
    # ...

def main():
    if len(sys.argv) < 2:
        print "ERROR: need to give file with texts as argument."
        sys.exit(1)
    s = os.path.getsize(sys.argv[1])
    f = open(sys.argv[1])
    try:
        data = f.read(s)
        parse_text(data)
    finally:
        f.close()

if __name__ == "__main__":
    main()

Solução

According to the docs: http://docs.python.org/library/struct.html

Unpacked fields can be named by assigning them to variables or by wrapping the result in a named tuple:

>>> record = 'raymond   \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)

>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name='raymond   ', serialnum=4658, school=264, gradelevel=8)

so in your case

>>> import struct
>>> from collections import namedtuple
>>> data = "1"*23
>>> fmt = "20c3B"
>>> Rec = namedtuple('Rec', 'text index') 
>>> r = Rec._make([struct.unpack_from(fmt, data)[0:20], struct.unpack_from(fmt, data)[20:]])
>>> r
Rec(text=('1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'), index=(49, 49, 49))
>>>

slicing the unpack variables maybe a problem, if the format was fmt = "20si" or something standard where we don't return sequential bytes, we wouldn't need to do this.

>>> import struct
>>> from collections import namedtuple
>>> data = "1"*24
>>> fmt = "20si"
>>> Rec = namedtuple('Rec', 'text index') 
>>> r = Rec._make(struct.unpack_from(fmt, data))
>>> r
Rec(text='11111111111111111111', index=825307441)
>>>

Outras dicas

Why not have parse_text use string slicing (data[:20], data[20:]) to pull apart the two values, and then process each one with struct?

Or take the 23 values and slice them apart into two?

I must be missing something. Perhaps you wish to make this happen via the struct module?

Here is my answer. I first wrote it using slicing instead of struct.unpack() but @samy.vilar pointed out that we can just use the "s" format to actually get the string out. (I should have remembered that!)

This answer uses struct.unpack() twice: once to get the strings out, and once to unpack the second string as an integer.

I'm not sure what you want to do with the "3B" item, but I'm guessing you want to unpack that as a 24-bit integer. I appended a 0 byte on the end of the 3-char string and unpacked as an integer, in case that is what you want.

Slightly tricky: the line like n, = struct.unpack(...) unpacks a length-1 tuple into one variable. In Python, the comma makes the tuple, so with one comma after one name we are using tuple unpacking to unpack a length-1 tuple into a single variable.

Also, we can use a with to open the file, which eliminates the need for the try block. We can also just use f.read() to read the whole file in one go, with no need to compute the size of the file.

def parse_text(data):
    fmt = "20s3s"
    l = len(data)
    sz = struct.calcsize(fmt)

    if l % sz != 0:
        print("ERROR: input data not a multiple of record size")

    num_records = l / sz
    if not num_records:
        print "ERROR: zero-length input file."
        return

    ofs = 0
    while ofs < l:
        s, x = struct.unpack(fmt, data[ofs:ofs+sz])
        # x is a length-3 string; we can append a 0 byte and unpack as a 32-bit integer
        n, = struct.unpack(">I", chr(0) + x) # unpack 24-bit Big Endian int
        ofs += sz
        ... # do something with s and with n or x

def main():
    if len(sys.argv) != 2:
        print("Usage: program_name <input_file_name>")
        sys.exit(1)

    _, in_fname = sys.argv

    with open(in_fname) as f:
        data = f.read()
        parse_text(data)

if __name__ == "__main__":
    main()

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow