Question

I need to parse a FIX message as fast as possible.

My approach is below. I keep a ref to the FixString in FP_FixString, then I work through it sequencially getting Tag values as I need them.

The fix message is large (approximately 4,000 chars and with about 100 repeating group of 3 tags that I need to extract).

Am I doing this as efficiently as possible? From a string building and parsing perspective?

public FixParser
{
        string FP_FixString;
        int FP_m;
        int FP_Lastm;

        public void Go()
        {
            using (StreamReader reader = new StreamReader(new GZipStream(File.OpenRead(@"L:\Logs\FIX.4.2-D.messages.current.log.20140512T.gz"), CompressionMode.Decompress)))
            {
                string line = "";
                while ( (line = reader.ReadLine()) != null)
                {

                      InitiateFixParse(ref line);

                      string Symbol;
                      if (!GetTagString(55, out Symbol))
                      return;

                      //DO ALL OTHER PROCESSING OF TAGS HERE
                }
        }


        public void InitiateFixParse(ref string fixString)
        {
            FP_Lastm = fixString.Length - 1;
            FP_FixString = fixString;
            FP_m = 0;
        }

       public bool IsEndMark()
        {
            if (FP_m>=FP_Lastm || FP_FixString[FP_m].Equals('\x01'))
                return true;

            return false;
        }

        public bool NextTag(out int Tag, out string ValueString)
        {
            Tag = 0;
            ValueString = "";
            if(FP_m>=FP_Lastm)
                return false;

            string TagString = "";
            bool GettingTag=true;
            while (!IsEndMark())
            {
                if (FP_FixString[FP_m].Equals('='))
                {
                    GettingTag = false;
                    FP_m++;
                }
                if(GettingTag)
                    TagString = TagString + FP_FixString[FP_m];
                else
                    ValueString = ValueString + FP_FixString[FP_m];
                FP_m++;
            }

            Tag=int.Parse(TagString);
            FP_m++; //Start of next Tag

            return true;
        }

        public bool GetTagString(int Tag, out string ValueString)
        {
            //bool FountIt = false;
            int aTag;
            string aValueString;
            while (NextTag(out aTag, out aValueString))
            {
                if (aTag == Tag)
                {
                    ValueString = aValueString;
                    return true;
                }
            }

            ValueString = "";
            return false;
        }
}
Was it helpful?

Solution 2

If you are dealing with very large files you want to avoid GC pressure as much as possible as already mentioned. StringBuilder helps but you are still creating a StringBuilder instance. Some quick wins:

if (GettingTag)
{
    //TagString = TagString + FP_FixString[FP_m];
    Tag = Tag * 10 + ((byte)FP_FixString[FP_m] -48);
}// This reduces time by about 40%

//Tag = int.Parse(TagString);

The code below reduces the time even further (reduces to 50% of original):

int valueStart = 0;
while (!IsEndMark())
{
    if (FP_FixString[FP_m].Equals('='))
    {
        GettingTag = false;
        FP_m++;
        valueStart = FP_m;
    }
    if (GettingTag)
    {
        //TagString = TagString + FP_FixString[FP_m];
        Tag = Tag * 10 + ((byte)FP_FixString[FP_m] -48);
    }
    //else
        //ValueString = ValueString + FP_FixString[FP_m];
    FP_m++;
}
ValueString = FP_FixString.Substring(valueStart, FP_m - valueStart);

//Tag = int.Parse(TagString);
FP_m++; //Start of next Tag

I havent looked at the code further.

OTHER TIPS

My only suggestion at first glance is to replace string concatenations that are done in loops, such as TagString = TagString + FP_FixString[FP_m];, with StringBuilders i.e.

StringBuilder sb = new StringBuilder();
while (!IsEndMark())
{
   sb.append(FP_FixString[FP_m]);
}
TagString = sb.ToString();

as StringBuilder is much more efficient than concatenation in loops. Usual caveats apply. I agree with @DumbCoder that profiling is a good idea.

As an aside it may be better to store a FIXml representation of the message and use xpath to extract the data if possible (i.e. if you aren't trying to parse automatically created logs) as it can help to solve some repeating groups issues, or rather it did for me here!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top