Reading stream with 2 different readers

https://stackoverflow.com/questions/15168158

16-03-2022
|

Domanda

I have a text file that contains a fixed length table that I am trying to parse. However, the beginning of the file is general information about when this table was generated (IE Time, Data, etc).

To read this I have attempted to make a FileStream, then read the first part of this file with a StreamReader. I parse out what I need from the top part of the document, and then when I am done, set the stream's position to the first line of the structured data.

Then I attach a TextFieldParser to the stream (with appropriate settings for the fixed length table), and then attempt to read the file. On the first row, it fails, and in the ErrorLine property, it lists off the last half of the third row of the table. I stepped through it and it was on the first row to read, yet the ErrorLine property suggests otherwise.

When debugging, I found that if I tried using my StreamReader.ReadLine() method after I had attached the TextFieldParser to the stream, the first 2 row show up fine. When I read the third row however, it returns a line where it starts with the first half of the third row (and stops right where the text in ErrorLine would be) appends some part from much later in the document. If I try this before I attach the TextFieldParser, it reads all 3 rows fine.

I have a feeling this has to do with my tying 2 readers to the same stream. I'm not sure how to read this with a structured part and an unstructured part, without just tokenizing the lines myself. I can do that but I assume I am not the first person to want to read part of a stream one way, and a later part of a stream in another.

Why is it skipping like this, and how would you read a text file with different formats?

Example:

Date: 3/1/2013
Time: 3:00 PM
Sensor:  Awesome Thing

Seconds   X        Y          Value
0         5.1      2.8        55
30        4.9      2.5        33
60        5.0      5.3        44

Code tailored for this simplified example:

Boolean setupInfo = true;
DataTable result = new DataTable();
String[] fields;
Double[] dFields;

FileStream stream = File.Open(filePath,FileMode.Open);

StreamReader reader = new StreamReader(stream);

String tempLine;

for(int j = 1; j <= 7; j++)
{
   result.Columns.Add(("Column" + j));
}

//Parse the unstructured part
while(setupInfo)
{
   tempLine = reader.ReadLine();
   if( tempLine.StartsWith("Date:  "))
   {
       result.Rows.Add(tempLine);
   }
   else if (tempLine.StartsWith("Time:  "))
   {
       result.Rows.Add(tempLine);
   }
   else if (tempLine.StartsWith("Seconds")
   {
      //break out of this loop because the 
      //next line to be read is the unstructured part
      setupInfo =  false;
   }
}

//Parse the structured part
TextFieldParser parser = new TextFieldParser(stream);
parser.TextFieldType = FieldType.FixedWidth;
parser.HasFieldsEnclosedInQuotes = false;
parser.SetFieldWidths(10, 10, 10, 10);

while (!parser.EndOfData)
{
   if (reader.Peek() == '*')
   {
       break;
   }
   else
   {
       fields = parser.ReadFields();

       if (parseStrings(fields, out dFields))
       {
           result.Rows.Add(dFields);
       }
   }
}
return result;

Soluzione

The reason it's skipping is that the StreamReader is reading blocks of data from the FileStream, rather than reading character-by-character. For example, the StreamReader might read 4 kilobytes from the FileStream and then parse out the lines as required to respond to ReadLine() calls. So when you attach the TextFieldParser to the FileStream, it's going to read from the current file position -- which is where the StreamReader left it.

The solution should be pretty simple: just connect the TextFieldParser to the StreamReader:

TextFieldParser parser = new TextFieldParser(reader);

See TextFieldParser(TextReader reader)

Altri suggerimenti

Generally speaking, most streams are consuming - that is, once read, it's no longer available. You could fork off to multiple streams by writing an intermediary class that derives from Stream and either raises an event, republished to other streams, etc.

In your case you don't need the StreamReader. The best choice is to check the file contents is using the File.ReadLines method instead. It will not load the whole file content, just the lines until you've found all that you need:

foreach (string line in File.ReadLines(filePath))
{
    if( line.StartsWith("Date:  "))
    {
        result.Rows.Add(line);
    }
    else if (line.StartsWith("Time:  "))
    {
        result.Rows.Add(line);
    }
    else if (line.StartsWith("Seconds"))
    {
       break;
    }
}

EDIT

You can do it even more simple using LINQ:

var d = from line in File.ReadLines(filePath) where line.Contains("Date:  ") select line;
result.Rows.Add(d);

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow