Вопрос

I'm working on an application writing binary data (ints, doubles, raw bytes) to a file.

Problem is, that the data is not actually written to the file the way I expect it to be:

>>> import struct
>>> import io
>>> out = io.open("123.bin", "wb+")
>>> format = "!i"
>>> data = struct.pack(format, 1)
>>> out.write(data)
4L
>>> data
'\x00\x00\x00\x01'
>>> out.close()
>>> infile = io.open("123.bin", "rb")
>>> instr = infile.read()
>>> instr
'\x00\x00\x00\x01'
>>> struct.unpack("!I", instr)
(1,)

So everything looks like it's working just fine. But upon closer examination, the 123.bin file has following contents:

$ hexdump 123.bin 
0000000 0000 0100                              
0000004

So it looks like the bytes were swapped by io.write()!

The python documentation says, that io.write() accepts "given bytes or bytearray object", problem is, that struct.pack does return an str:

>>> type(struct.pack(format, 1))
<type 'str'>

So, what am I doing wrong? How do I convert str to bytes without any charset translation?

Это было полезно?

Решение

Looks like this is an oddity of hexdump(1). Using xxd(1), I get...

$ xxd 123.bin
0000000: 0000 0001                                ....

...which looks correct.

Looks like you have to use the -C option to get hexdump(1) to output in a sane format...

$ hexdump -C 123.bin
00000000  00 00 00 01                                       |....|
00000004

...or call it as hd instead.

Другие советы

The problem here isn't with python, but with hexdump. It's treating the data in the file as 16 bit little endian values. What you need to do is tell hexdump to treat the data as 8 bit values. Without looking it up, I think it's the '-c' option.

The default output format of hexdump is the same as using the -x option, that is, according to the man page:

 -x      Two-byte hexadecimal display.  Display the input offset in hexadecimal,
         followed by eight, space separated, four column, zero-filled, two-byte
         quantities of input data, in hexadecimal, per line.

And the endianness used by hexdump is the architecture endianness (here likely little-endian), while you asked python to store the value in network order (big-endian).

So, the value is correctly stored, but incorrectly interpreted by hexdump. You can either use the -C option or xxd instead of hexdump.

$ hexdump 123.bin
0000000 0000 0100                              
0000004
$ hexdump -C 123.bin
00000000  00 00 00 01                                       |....|
00000004
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top