BSON how does a binary protocol work'? how do I read a (pseudo-)Backus-Naur-Form? [closed]

StackOverflow https://stackoverflow.com/questions/19722170

  •  02-07-2022
  •  | 
  •  

Question

I'm reading MongoDB specs and it uses data format BSON

Looking at the doc, I'd like to understand how the example BSON at the bottom of their page is encoded

{"hello": "world"}  →   "\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00"

{"BSON": ["awesome", 5.05, 1986]}   →   "\x31\x00\x00\x00\x04BSON\x00\x26\x00 
 \x00\x00\x020\x00\x08\x00\x00 
 \x00awesome\x00\x011\x00\x33\x33\x33\x33\x33\x33
 \x14\x40\x102\x00\xc2\x07\x00\x00 
 \x00\x00"
Was it helpful?

Solution

I think the question is essentially 'how does a binary protocol work'? Or `how do I read a (pseudo-)Backus-Naur-Form?

You can think of it like this: Your protocol consists of format information that is used to structure the data, and the data itself. What you see in JSON as an opening bracket {, for example, means something like "start a new (sub-)document".

Per definition, this 'command' is implicit and simply consists of the length of everything that is to follow, then the content (an e_list), then a \x00 terminator byte. So, since the document is 22 bytes long (that is 0x16 in hex), the 'command' is \x16\x00\x00\x00. Why the three \x00? Because we need an int32, i.e. a 32-bit integer so it must be padded to a full four bytes. Why \x16\x00\x00\x00 and not \x00\x00\x00\x16? This is called endianess and BSON uses little-endian.

Then comes the defintion of the content, the e_list. An e_list is defined as an element followed by another e_list, which can be empty and then terminates. An element is defined as the type of the value first, then the e_name, followed by the actual data. So, since the value of "hello" is "world", which is a string and strings are identified by a \x02 according to the spec, the \x02 comes next, followed by the e_name "hello" and a null terminator (hello\x00).

Now comes the actual value which is a string, which is defined as int32 (byte*) "\x00", i.e. the length of the string, the actual data and a null terminator (with the length including the null terminator), so the length becomes \x06\x00\x00\x00, followed by the actual data world\x00 and the \x00 terminator for the top-level BSON document.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top