Question

My setup is a Flask-based server. A bird-view of the project would be: the Flask-based server fetches binary data from AWS S3 based on some algorithmic calculations (like figuring out the filenames to fetch from S3), and serves the data to an HTML+JavaScript client.

At first, I thought a JSON object to be the best response type. I created a JSON response with following (possibly syntactically incorrect) format:

{
  'payload': [
    {
      'symbol': 'sym',
      'exchange': 'exch',
      'headerfile': {
        'name': '#name',
        'content': '#binarycontent'
      },
      'datafiles': [
        {
          'name': '#name',
          'content': '#binarycontent'
        },
        {
          'name': '#name',
          'content': '#binarycontent'
        }
      ]
    },
    'errors': [ //errors ]
}

I apologise for any syntactical errors in the JSON; I am a bit sleepy to find out a minor error. After structuring this JSON, I came to know that JSON doesn't natively support binary data in it. So, I wouldn't be able to embed the binary data as values in JSON.

I realize that I can always convert the bytes into base64-encoded string, and use the string as value in JSON. But, a resultant string is around 30% extra in size; 4010 bytes of data was encoded into 5348 bytes, which while insignificant for a single binary chunk, is seen as a concern by my client when it comes to embedding a lot of such binary chunks in a JSON response. Due to the extra size, response would take more time to reach the client, which is a crucial concern for my client's application.

Another option I considered was to stream the binary chunks as octet-stream Content-Type to the client. But I am not sure if its any better than the above solution. Futhermore, I haven't been able to figure out how to relate the binary chunks and their names in such a situation.

Is there a solution better than 'convert binary to text and embed into JSON'?

Was it helpful?

Solution

I solved the problem, and will write down the solution hoping it could save someone else's time.

Thank you, @dstromberg and @LukasGraf for your advices. I checked out BSON first, and found it sufficient for my needs, so never went into details of Procotol Buffer.

BSON on PyPi is available into two packages. In pymongo, it comes as a supplement to MongoDB. In bson, it is a standalone package, obviously suiting to my needs. However, it supports only Python2. So I looked around for a Python3 implementation before rolling out my own port, and found another implementation of BSON spec on bsonspec.org: Link to the module.

The simplest usage of that module goes like this:

>>> import bson
warning: module typecheck.py cannot be imported, type checking is skipped
>>> encoded = bson.serialize_to_bytes({'name': 'chunkfile', 'content': b'\xad\x03\xae\x03\xac\x03\xac\x03\xd4\x13'})
>>> print(encoded)
b'1\x00\x00\x00\x02name\x00\n\x00\x00\x00chunkfile\x00\x05content\x00\n\x00\x00\x00\x00\xad\x03\xae\x03\xac\x03\xac\x03\xd4\x13\x00'
>>> decoded = bson.parse_bytes(encoded)
>>> print(decoded)
OrderedDict([('name', 'chunkfile'), ('content', b'\xad\x03\xae\x03\xac\x03\xac\x03\xd4\x13')])

As you can see, it can accommodate binary data as well. I sent the data from Flask as mimetype=application/bson, which was accurately parsed by the receiving JavaScript using this standalone BSON library provided by MongoDB team.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top