Question

I'm looking at parsing a delimited string, something on the order of

a,b,c

But this is a very simple example, and parsing delimited data can get complex; for instance

1,"Your simple algorithm, it fails",True

would blow your naiive string.Split implementation to bits. Is there anything I can freely use/steal/copy and paste that offers a relatively bulletproof solution to parsing delimited text? .NET, plox.

Update: I decided to go with the TextFieldParser, which is part of VB.NET's pile of goodies hidden away in Microsoft.VisualBasic.DLL.

Was it helpful?

Solution

I use this to read from a file

string filename = @textBox1.Text;
string[] fields;
string[] delimiter = new string[] {"|"};
using (Microsoft.VisualBasic.FileIO.TextFieldParser parser =
       new Microsoft.VisualBasic.FileIO.TextFieldParser(filename)) {
    parser.Delimiters = delimiter;
    parser.HasFieldsEnclosedInQuotes = false;

    while (!parser.EndOfData) {
        fields = parser.ReadFields();
        //Do what you need
    }
}

I am sure someone here can transform this to parser a string that is in memory.

OTHER TIPS

A very complrehesive library can be found here: FileHelpers

I am not aware of any framework, but a simple state machine works:

  • State 1: Read every char until you hit a " or a ,
    • In case of a ": Move to State 2
    • In case of a ,: Move to State 3
    • In case of the end of file: Move to state 4
  • State 2: Read every char until you hit a "
    • In case of a ": Move to State 1
    • In case of the end of the file: Either Move to State 4 or signal an error because of an unterminated string
  • State 3: Add the current buffer to the output array, move the cursor forward behind the , and back to State 1.
  • State 4: this is the final state, does nothing except returning the output array.

Such as

var elements = new List<string>();
var current = new StringBuilder();
var p = 0;

while (p < internalLine.Length) {
    if (internalLine[p] == '"') {
        p++;

        while (internalLine[p] != '"') {
            current.Append(internalLine[p]);
            p++;
        }

        // Skip past last ',
        p += 2;
    }
    else {
        while ((p < internalLine.Length) && (internalLine[p] != ',')) {
            current.Append(internalLine[p]);
            p++;
        }

        // Skip past ,
        p++;
    }

    elements.Add(current.ToString());
    current.Length = 0;
}

There are some good answers here: Split a string ignoring quoted sections

You might want to rephrase your question to something more precise (e.g. What code snippet or library I can use to parse CSV data in .NET?).

To do a shameless plug, I've been working on a library for a while called fotelo (Formatted Text Loader) that I use to quickly parse large amounts of text based off of delimiter, position, or regex. For a quick string it is overkill, but if you're working with logs or large amounts, it may be just what you need. It works off a control file model similar to SQL*Loader (kind of the inspiration behind it).

Better late than never (add to the completeness of SO):

http://www.codeproject.com/KB/database/CsvReader.aspx

This one ff-ing rules.

GJ

I am thinking that a generic framework would need to specify between two things: 1. What are the delimiting characters. 2. Under what condition do those characters not count (such as when they are between quotes).

I think it may just be better off writing custom logic for every time you need to do something like this.

Simplest way is just to split the string into a char array and look for your string determiners and split char.

It should be relatively easy to unit test.

You can wrap it in an extension method similar to the basic .Spilt method.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top