Question

I am totally new to Python. I have to parse a .txt file that contains network byte order binary encoded numbers (see here for the details on the data). I know that I have to use the package struct.unpack in Python. My questions are the following:

(1) Since I don't really understand how the function struct.unpack works, is it straight forward to parse the data? By that, I mean that if you look at the data structure it seems that I have to write a code for each type of messages. But if I look online for the documentation on struct.unpack it seems more straight forward but I am not sure how to write the code. A short sample would be appreciated.

(2) What's the best practice once I parse the data? I would like to save the parsed file in order to avoid parsing the file each time I need to make a query. In what format should I keep the parsed file that would be the most efficient?

Was it helpful?

Solution

This should be relatively straight forward. I can't comment on how you're actually supposed to get the byte encoded packets of information, but I can help you parse them.

First, here's a list of some of the packet types you'll be dealing with that I gathered from section 4 of the documentation:

  1. TimeStamp
  2. System Event Message
  3. Stock Related Messages
    1. Stock Directory
    2. Stock Trading Action
    3. Reg SHO Short Sale Price Test Restricted Indicator
    4. Market Participant Position
  4. Add Order Message

This continues on. But as an example, let's see how to decode one or two of these:


System Event Message

A System Event Message packet has 3 portions, which is 6 bytes long:

  1. A Message Type, which starts at byte 0, is 1 byte long, with a Value of S (a Single Character)
  2. A TimeStamp, which starts at byte 1, is 4 bytes long, and should be interpreted an in Integer.
  3. An Event Code, which starts at byte 5, is 1 byte long and is a String (Alpha).

Looking up each type in the struct.unpack code table, we'll need to build a string to represent this sequence. First, we have a Character, then a 4Byte Unsigned Integer, then another Character. This corresponds to the encoding and decoding string of "cIc".

*NOTE: The unsigned portion of the Integer is documented in Section 3: Data Types of their documentation

Construct a fake packet

This could probably be done better, but it's functional:

>>> from datetime import datetime
>>> import time
>>> data = struct.pack('cIc', 'S', int(time.mktime(datetime.now().timetuple())), 'O')
>>> print repr(data)  # What does the bytestring look like?
'S\x00\x00\x00\xa6n\x8dRO'  # Yep, that's bytes alright!

Unpack the data

In this example, we'll use the fake packet above, but in the real world we'd use a real data response:

>>> response_tuple = struct.unpack('cIc', data)
>>> print(repr(response_tuple))
('S', 1385000614, 'O')

In this case, the 3rd item in the tuple (the 'O') is a key, to be looked up in another table called System Event Codes - Daily and System Event Codes - As Needed.

If you need additional examples, feel free to ask, but that's the jist of it.


Recommendations on how to store this data. Well, I suppose that depends on what you'd like to do long term to this data. Probably, a database makes sense here. However, without further information, I cannot say.

Hope that helps!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top