Question

I'm using MessagePack for developing a client SDK. I need to develop clients in java, ObjC and python while my server is in java. I don't have any problems with java and ObjC msgpack libraries but in python, when I pack a dictionary with string value longer than 31 characters, the packed data won't unpack in other languages. Trying to unpack the same in python works fine though and as long as the string length is less than 32, the inter operability is very fine too. The below is one python example that fails..

myPacket = {u"api_key":u"ad09739ac168ff2a199fb24eb4e24db8"}
msgPackedPacket = umsgpack.packb(myPacket)

the binary data generated for this is

<81a76170 695f6b65 79d92061 64303937 33396163 31363866 66326131 39396662 32346562 34653234 646238>

while if I covert a dictionary with same key values in ObjC, I get

<81a76170 695f6b65 79da0020 61643039 37333961 63313638 66663261 31393966 62323465 62346532 34646238>

The ObjC result unpacks fine and the python won't.. and you can note the 2 extra bytes in data from ObjC.

A properly working example as below

myPacket = {u"api_key":u"ad09739ac168ff2a199fb24eb4e24d"}  

number of characters = 30 here.. and I get the following bytes in python

<81a76170 695f6b65 79be6164 30393733 39616331 36386666 32613139 39666232 34656234 65323464>

and for ObjC, I get the below bytes..

<81a76170 695f6b65 79be6164 30393733 39616331 36386666 32613139 39666232 34656234 65323464>

I'm sorry if I miss something obvious.. looking for any workarounds or suggestions as I am struck for more than a day..

Thanks in advance.

Was it helpful?

Solution

When looking at what characters are encoded by the hexadecimal strings, you can see that the first one decodes to

'\x81\xa7api_key\xd9 ad09739ac168ff2a199fb24eb4e24db8'  # Python's version

while the second one decodes to

'\x81\xa7api_key\xda\x00 ad09739ac168ff2a199fb24eb4e24db8'  # ObjC's version

The third, 30 bytes long string, decodes to

'\x81\xa7api_key\xbead09739ac168ff2a199fb24eb4e24d'     # both versions

Intrigued by the problem, I googled for MsgPack's specs and came across this.

Now things are getting clearer.

  • \x81 says that the following is a one-element map.
  • \xA7 says that the following is a seven-character string.
  • api_key is that seven-character string.

So far, so good. Now the differences begin:

  • \xd9 says that a str8 string follows. The byte after \xd9 is \x20 (hex 20 == dec 32 == ASCII space). It denotes the length of that string (32). That's what Python uses, correctly, because str8 is usable for strings of up to 255 characters in length.
  • \xda says that a str16 string follows. The following two bytes are \x00\x20 (hex 0020 == dec 32, as before). They also denote the length of the following string (32 again). That's what ObjC does, apparently. This is just as legal from the spec's point of view, just a little wasteful (one wasted byte).
  • For strings of less than 32 characters, both implementations use a fixstr type that encodes a length of 1-31 characters in the bitfield 101xxxxx which becomes \xbe for a 30-character string (bin 10111110).

So it seems that all the serializations are correct, but the deserializer you're using can't handle the str8 datatype used by Python's serializer. The implementation guidelines state that dut to format changes, not all releases support str8, so that serializers should provide a compatibility mode without it. Python's msgpack package doesn't, though.

UPDATE: Just a few hours after the bug report, the developer of msgpack-Python has added a compatibility switch to force Python to create str16 serializations instead of str8. Well done!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top