how to continue a loop despite error in parsing byte data

https://stackoverflow.com/questions/17188341

01-06-2022
|

Question

my question is a continuation of this: (loop for reading different data types & sizes off very large byte array from file)

I have a raw byte stream stored on a file (rawbytes.txt or bytes.data) that I need to parse and output to a CSV-style text file.

The input of raw bytes (when read as characters/long/int etc.) looks something like this:

A2401028475764B241102847576511001200C...

Parsed it should look like:

OutputA.txt

(Field1,Field2,Field3) - heading

A,240,1028475764

OutputB.txt

(Field1,Field2,Field3,Field4,Field5) - heading

B,241,1028475765,1100,1200

OutputC.txt

C,...//and so on

Essentially, it's a hex-dump-style input of bytes that is continuous without any line terminators or gaps between data that needs to be parsed. The data, as seen above, consists of different data types one after the other.

Here's a snippet of my code - because there are no commas within any field, and no need arises to use "" (i.e. a CSV wrapper), I'm simply using TextWriter to create the CSV-style text file as follows:

if (File.Exists(fileName))
        {
        using (BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open)))
            {
        while (reader.BaseStream.Position != reader.BaseStream.Length)
            {
                inputCharIdentifier = reader.ReadChar();
                switch (inputCharIdentifier)
                     case 'A':

                        field1 = reader.ReadUInt64();
                        field2 = reader.ReadUInt64();
                        field3 = reader.ReadChars(10);
                        string strtmp = new string(field3);
                        //and so on
                        using (TextWriter writer = File.AppendText("outputA.txt"))
                        {
                            writer.WriteLine(field1 + "," + field2 + "," + strtmp); // +  
                        }
                        case 'B':
                        //code...

My question is based on the fact that some of the raw byte data contains null values, which are difficult to parse through - because there are an unknown number of null bytes (or non-null, out-of-place bytes) between consecutive data blocks (each starting with A, B or C if the data blocks are not corrupt).

QUESTION

So, how do I add a default case or some other mechanism to continue with the loop despite an error that might arise because of corrupt or faulty data? Is the following code something that would work?

    inputCharIdentifier = reader.ReadChar();
    ...
    case default:
    //I need to know what to add here, instead of default 
    //(i.e. the case when the character could not be read)
    while (binReader.PeekChar() != -1)
    {
         filling = binReader.readByte();
         //filling is a single byte
         try {
             fillingChar = Convert.ToChar(filling);

             break;
         }
         catch (Exception ex) { break; }
         if (fillingChar == 'A' || fillingChar == 'B')
             break;

The remaining part - adding code to each switch case (eg 'A') to continue without stopping the program - is there a way to do this without multiple try-catch blocks? [i.e. the code block character identifier is A, but the bytes after A are corrupt - in which case i need to exit the loop OR read (i.e. skip over) a defined number of bytes - which here, would be known if the message header correctly identifies the remaining bytes.

[Note: Case A, B and so on have different sized input - in other words, A might be 40 bytes total, while B is 50 bytes. So the use of a fixed size buffer, say inputBuf[1000], or [50] for instance - if they were all the same size - wouldn't work well either, AFAIK.]

Any suggestions? Please help! I'm relatively new to C# (2 months)...

Update: my entire code is as follows:

         class Program
{
    const string fileName = "rawbytes.txt";
    static void Main(string[] args)
    {
                    try
        {
            var program = new Program();
            program.Parser();
        }
        catch (Exception e)
        {
            Console.WriteLine(e);
        }
        Console.ReadLine();
    }
    public void Parser()
    {
        char inputCharIdentifier = 'Z';
        //only because without initializing inputCharIdentifier I ended up with an error
        //note that in the real code, 'Z' is not a switch-case alphabet
        //it's an "inconsequential character", i.e. i have defined it to be 'Z'
        //just to avoid that error, and to avoid leaving it as a null value
        ulong field1common = 0;
        ulong field2common = 0;
        char[] charArray = new char[10];
        char char1;
        char char2;
        char char3;
        int valint1 = 0;
        int valint2 = 0;
        int valint3 = 0;
        int valint4 = 0;
        int valint5 = 0;
        int valint6 = 0;
        int valint7 = 0;
        double valdouble;
        /*
        char[] filler = new char[53];
        byte[] filling = new byte[4621];
        byte[] unifiller = new byte[8];
        //these values above were temporary measures to manually filter through
        //null bytes - unacceptable for the final program
        */
        if (File.Exists(fileName))
        {
            using (BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open)))
            {
                while (reader.BaseStream.Position != reader.BaseStream.Length)
                {
                    //inputCharIdentifier = reader.ReadChar();
                    //if (inputCharIdentifier != null)
                    //{
                        try
                        {
                            inputCharIdentifier = reader.ReadChar();
                            try
                            {
                                switch (inputCharIdentifier)
                                {
                                    case 'A':

                                        field1common = reader.ReadUInt64();
                                        field2common = reader.ReadUInt64();
                                        //unifiller = reader.ReadBytes(8);
                                        //charArray = reader.ReadString();
                                        //result.ToString("o");
                                        //Console.WriteLine(result.ToString());
                                        charArray = reader.ReadChars(10);
                                        string charArraystr = new string(charArray);
                                        char1 = reader.ReadChar();
                                        valint1 = reader.ReadInt32();
                                        valint2 = reader.ReadInt32();
                                        valint3 = reader.ReadInt32();
                                        valint4 = reader.ReadInt32();
                                        using (TextWriter writer = File.AppendText("A.txt"))
                                        {
                                            writer.WriteLine(field1common + "," + /*result.ToString("o")*/ field2common + "," + charArraystr + "," + char1 + "," + valint1 + "," + valint2 + "," + valint3 + "," + valint4);
                                            writer.Close();
                                        }
                                        break;


                                    case 'B':
                                    case 'C':

                                        field1common = reader.ReadUInt64();
                                        field2common = reader.ReadUInt64();
                                        //charArray = reader.ReadString();
                                        charArray = reader.ReadChars(10);
                                        string charArraystr2 = new string(charArray);
                                        char1 = reader.ReadChar();
                                        valint1 = reader.ReadInt32();
                                        valint2 = reader.ReadInt32();
                                        using (TextWriter writer = File.AppendText("C.txt"))
                                        {
                                            writer.WriteLine(field1common + "," + result2.ToString("o") + "," + charArraystr2 + "," + char1 + "," + valint1 + "," + valint2);
                                            writer.Close();
                                        }
                                        break;
                                    case 'S':
                                        //market status message
                                        field1common = reader.ReadUInt64();
                                        char2 = reader.ReadChar();
                                        char3 = reader.ReadChar();
                                        break;
                                    case 'L':
                                        filling = reader.ReadBytes(4);
                                        break;
                                    case 'D':
                                    case 'E':
                                        field1common = reader.ReadUInt64();
                                        field2common = reader.ReadUInt64();
                                        //charArray = reader.ReadString();
                                        charArray = reader.ReadChars(10);
                                        string charArraystr3 = new string(charArray);
                                        //char1 = reader.ReadChar();
                                        valint1 = reader.ReadInt32();
                                        valint2 = reader.ReadInt32();
                                        valint5 = reader.ReadInt32();
                                        valint7 = reader.ReadInt32();
                                        valint6 = reader.ReadInt32();
                                        valdouble = reader.ReadDouble();
                                        using (TextWriter writer = File.AppendText("D.txt"))
                                        {
                                            writer.WriteLine(field1common + "," + result3.ToString("o") + "," + charArraystr3 + "," + valint1 + "," + valint2 + "," + valint5 + "," + valint7 + "," + valint6 + "," + valdouble);
                                            writer.Close();
                                        }
                                        break;
                                    }
                            }
                            catch (Exception ex)
                            {
                                Console.WriteLine("Parsing didn't work");
                                Console.WriteLine(ex.ToString());
                                break;
                            }
                        }
                        catch (Exception ex)
                        {
                            Console.WriteLine("Here's why the character read attempt didn't work");
                            Console.WriteLine(ex.ToString());

                            continue;
                            //continue;
                        }
                    //}
                }
            }
            }
            }

The error I receive is as follows:

    Here's why the character read attempt didn't work

    System.ArgumentException: The output char buffer is too small to contain the decoded characters, encoding 'Unicode (UTF-8)' fallback 'System.Text.DecoderReplacementFallback'.
    Parameter name: chars
    at System.Text.Encoding.ThrowCharsOverflow()
    at System.Text.Encoding.ThrowCharsOverflow(DecoderNLS decoder, Boolean nothingDecoded)
    at System.Text.UTF8Encoding.GetChars(Byte* bytes, Int32 byteCount, Char* chars, Int32 charCount, DecoderNLS baseDecoder)
    at System.Text.DecoderNLS.GetChars(Byte* bytes, Int32 byteCount, Char* chars, Int32 charCount, Boolean flush)
    at System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32 byteCount, Char[] chars, Int32 charIndex, Boolean flush)
    at System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32 byteCount, Char[] chars, Int32 charIndex)
    at System.IO.BinaryReader.InternalReadOneChar()
    at System.IO.BinaryReader.Read()
    at System.IO.BinaryReader.ReadChar()
    at line 69: i.e. inputCharIdentifier = reader.ReadChar();

Update: A sample file that generates the same error above is at the following link: http://www.wikisend.com/download/106394/rawbytes.txt

Notice in particular the 8 unexpected null bytes between successive data blocks, even though the data block header - i.e. inputCharIdentifier - is valid. The number of bytes that follows such a header is always unpredictable and generally varies. My issue is that I need to be able to either delete or skip over such a situation when it arises to the next non-corrupt data block available - in the case of the sample file, the last (single) data block that occurs after the 8 out-of-place null bytes.

The 8 null bytes can be located in the file as follows: Byte Counter: 1056 Line 2, Column 783 (according to Notepad++)

The crux of the problem is that the 8 null bytes can be any size - 3, 7, 15, 50, etc. It is always unknown - as a direct result of data corruption. But unlike "traditional" data corruption, i.e. where a fixed number of bytes, say 50, inside a data block that might be unreadable and can therefore be skipped over (by that exact number of bytes) - the data corruption i face consists of an unknown number of bytes between valid data blocks.

Solution

You cannot assign a case for these situations because the target variable (inputCharIdentifier) is null; thus it is enough with a condition avoiding these cases. I have also included a try...catch, just to make completely sure (any error while performing all the given actions would make the code to automatically skip to the following iteration).

try
{
    using (BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.ASCII))
    {
        while (reader.BaseStream.Position != reader.BaseStream.Length)
        {
            inputCharIdentifier = reader.ReadChar();
            if(inputCharIdentifier != null)
            {
               switch (inputCharIdentifier)
                 case 'A':
                    field1 = reader.ReadUInt64();
                    field2 = reader.ReadUInt64();
                    field3 = reader.ReadChars(10);
                    string strtmp = new string(field3);
                    //and so on
                    using (TextWriter writer = File.AppendText("outputA.txt"))
                    {
                       writer.WriteLine(field1 + "," + field2 + "," + strtmp); 
                    }
                 case 'B':
                   //code...
            }
        }
    }
}
catch
{
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow