Question

I want to parse a config file sorta thing, like so:

[KEY:Value]     
    [SUBKEY:SubValue]

Now I started with a StreamReader, converting lines into character arrays, when I figured there's gotta be a better way. So I ask you, humble reader, to help me.

One restriction is that it has to work in a Linux/Mono environment (1.2.6 to be exact). I don't have the latest 2.0 release (of Mono), so try to restrict language features to C# 2.0 or C# 1.0.

Was it helpful?

Solution

I considered it, but I'm not going to use XML. I am going to be writing this stuff by hand, and hand editing XML makes my brain hurt. :')

Have you looked at YAML?

You get the benefits of XML without all the pain and suffering. It's used extensively in the ruby community for things like config files, pre-prepared database data, etc

here's an example

customer:
  name: Orion
  age: 26
  addresses:
    - type: Work
      number: 12
      street: Bob Street
    - type: Home
      number: 15
      street: Secret Road

There appears to be a C# library here, which I haven't used personally, but yaml is pretty simple, so "how hard can it be?" :-)

I'd say it's preferable to inventing your own ad-hoc format (and dealing with parser bugs)

OTHER TIPS

I was looking at almost this exact problem the other day: this article on string tokenizing is exactly what you need. You'll want to define your tokens as something like:

@"(?&ltlevel>\s) | " +
@"(?&ltterm>[^:\s]) | " +
@"(?&ltseparator>:)"

The article does a pretty good job of explaining it. From there you just start eating up tokens as you see fit.

Protip: For an LL(1) parser (read: easy), tokens cannot share a prefix. If you have abc as a token, you cannot have ace as a token

Note: The article's missing the | characters in its examples, just throw them in.

There is another YAML library for .NET which is under development. Right now it supports reading YAML streams and has been tested on Windows and Mono. Write support is currently being implemented.

Using a library is almost always preferably to rolling your own. Here's a quick list of "Oh I'll never need that/I didn't think about that" points which will end up coming to bite you later down the line:

  • Escaping characters. What if you want a : in the key or ] in the value?
  • Escaping the escape character.
  • Unicode
  • Mix of tabs and spaces (see the problems with Python's white space sensitive syntax)
  • Handling different return character formats
  • Handling syntax error reporting

Like others have suggested, YAML looks like your best bet.

You can also use a stack, and use a push/pop algorithm. This one matches open/closing tags.

public string check()
    {
        ArrayList tags = getTags();


        int stackSize = tags.Count;

        Stack stack = new Stack(stackSize);

        foreach (string tag in tags)
        {
            if (!tag.Contains('/'))
            {
                stack.push(tag);
            }
            else
            {
                if (!stack.isEmpty())
                {
                    string startTag = stack.pop();
                    startTag = startTag.Substring(1, startTag.Length - 1);
                    string endTag = tag.Substring(2, tag.Length - 2);
                    if (!startTag.Equals(endTag))
                    {
                        return "Fout: geen matchende eindtag";
                    }
                }
                else
                {
                    return "Fout: geen matchende openeningstag";
                }
            }
        }

        if (!stack.isEmpty())
        {
            return "Fout: geen matchende eindtag";
        }            
        return "Xml is valid";
    }

You can probably adapt so you can read the contents of your file. Regular expressions are also a good idea.

It looks to me that you would be better off using an XML based config file as there are already .NET classes which can read and store the information for you relatively easily. Is there a reason that this is not possible?

@Bernard: It is true that hand editing XML is tedious, but the structure that you are presenting already looks very similar to XML.

Then yes, has a good method there.

@Gishu

Actually once I'd accommodated for escaped characters my regex ran slightly slower than my hand written top down recursive parser and that's without the nesting (linking sub-items to their parents) and error reporting the hand written parser had.

The regex was a slightly faster to write (though I do have a bit of experience with hand parsers) but that's without good error reporting. Once you add that it becomes slightly harder and longer to do.

I also find the hand written parser easier to understand the intention of. For instance, here is the a snippet of the code:

private static Node ParseNode(TextReader reader)
{
    Node node = new Node();
    int indentation = ParseWhitespace(reader);
    Expect(reader, '[');
    node.Key = ParseTerminatedString(reader, ':');
    node.Value = ParseTerminatedString(reader, ']');
}

Regardless of the persisted format, using a Regex would be the fastest way of parsing. In ruby it'd probably be a few lines of code.

\[KEY:(.*)\] 
\[SUBKEY:(.*)\]

These two would get you the Value and SubValue in the first group. Check out MSDN on how to match a regex against a string.

This is something everyone should have in their kitty. Pre-Regex days would seem like the Ice Age.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top