Question

The Objective

I'm trying to achieve the following:

  • capture network traffic containing a conversation in the FIX protocol
  • extract the individual FIX messages from the network traffic into a "nice" format, e.g. CSV
  • do some data analysis on the exported "nice" format data

I have achieved this by:

  • using pcap to capture the network traffic
  • using tshark to print the relevant data as a CSV
  • using Python (pandas) to analyse the data

The Problem

The problem is that some of the captured TCP packets contain more than one FIX message, which means that when I do the export to CSV using tshark I don't get a FIX message per line. This makes consuming the CSV difficult.

This is the tshark commandline I'm using to extract the relevant FIX fields as CSV is:

tshark -r dump.pcap \
-R \'(fix.MsgType[0]=="G" or fix.MsgType[0]=="D" or fix.MsgType[0]=="8" or \ fix.MsgType[0]=="F") and fix.ClOrdID != "0"\' \ 
-Tfields -Eseparator=, -Eoccurrence=l -e frame.time_relative \
-e fix.MsgType -e fix.SenderCompID \
-e fix.SenderSubID -e fix.Symbol -e fix.Side \
-e fix.Price -e fix.OrderQty -e fix.ClOrdID \
-e fix.OrderID -e fix.OrdStatus'

Note that I'm currently using "-Eoccurrence=l" to get just the last occurrence of a named field in the case where there is more than one occurrence of a field in the packet. This is not an acceptable solution as information will get thrown away when there are multiple FIX messages in a packet.

This is what I expect to see per line in the exported CSV file (fields from one FIX message):

16.508949000,D,XXX,XXX,YTZ2,2,97480,34,646427,,

This is what I see when there is more than one FIX message (three is this case) in a TCP packet and the commandline flag "-Eoccurrence=a" is used:

16.515886000,F,F,G,XXX,XXX,XXX,XXX,XXX,XXX,XTZ2,2,97015,22,646429,646430,646431,323180,323175,301151,

The Question

Is there a way (not necessarily using tshark) to extract each individual, protocol specific message from a pcap file?

Was it helpful?

Solution

Better Solution

Using tcpflow allows this to be done properly without leaving the commandline.

My current approach is to use something like:

tshark -nr <input_file> -Y'fix' -w- | tcpdump -r- -l -w- | tcpflow -r- -C -B

tcpflow ensures that the TCP stream is followed, so no FIX messages are missed (in the case where a single TCP packet contains more than 1 FIX message). -C writes to the console and -B ensures binary output. This approach is not unlike following a TCP stream in Wireshark.

The FIX delimiters are preserved which means that I can do some handy grepping on the output, e.g.

... | tcpflow -r- -C -B | grep -P "\x0135=8\x01"

to extract all the execution reports. Note the -P argument to grep which allows the very powerful perl regex.

A (Previous) Solution

I'm using Scapy (see also Scapy Documentation, The Very Unofficial Dummies Guide to Scapy) to read in a pcap file and extract each individual FIX message from the packets.

Below is the basis of the code I'm using:

from scapy.all import *

def ExtractFIX(pcap):
    """A generator that iterates over the packets in a scapy pcap iterable
and extracts the FIX messages.
In the case where there are multiple FIX messages in one packet, yield each
FIX message individually."""
    for packet in pcap:
        if packet.haslayer('Raw'):
            # Only consider TCP packets which contain raw data.
            load = packet.getlayer('Raw').load

            # Ignore raw data that doesn't contain FIX.
            if not 'FIX' in load:
                continue

            # Replace \x01 with '|'.
            load = re.sub(r'\x01', '|', load)

            # Split out each individual FIX message in the packet by putting a 
            # ';' between them and then using split(';').
            for subMessage in re.sub(r'\|8=FIX', '|;8=FIX', load).split(';'):
                # Yield each sub message. More often than not, there will only be one.
                assert subMessage[-1:] == '|'
                yield subMessage
        else:
            continue

pcap = rdpcap('dump.pcap')
for fixMessage in ExtractFIX(pcap):
    print fixMessage        

I would still like to be able to get other information from the "frame" layer of the network packet, in particular the relative (or reference) time. Unfortunately, this doesn't seem to be available from the Scapy packet object - it's topmost layer is the Ether layer as shown below.

In [229]: pcap[0]
Out[229]: <Ether  dst=00:0f:53:08:14:81 src=24:b6:fd:cd:d5:f7 type=0x800 |<IP  version=4L ihl=5L tos=0x0 len=215 id=16214 flags=DF frag=0L ttl=128 proto=tcp chksum=0xa53d src=10.129.0.25 dst=10.129.0.115 options=[] |<TCP  sport=2634 dport=54611 seq=3296969378 ack=2383325407 dataofs=8L reserved=0L flags=PA window=65319 chksum=0x4b73 urgptr=0 options=[('NOP', None), ('NOP', None), ('Timestamp', (581177, 2013197542))] |<Raw  load='8=FIX.4.0\x019=0139\x0135=U\x0149=XXX\x0134=110169\x015006=20\x0150=XXX\x0143=N\x0152=20121210-00:12:13\x01122=20121210-00:12:13\x015001=6\x01100=SFE\x0155=AP\x015009=F3\x015022=45810\x015023=3\x015057=2\x0110=232\x01' |>>>>
In [245]: pcap[0].summary()
Out[245]: 'Ether / IP / TCP 10.129.0.25:2634 > 10.129.0.115:54611 PA / Raw'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top