Question

I'd like to understand how to construct a parser in .NET to process source files. For example, maybe I could begin by learning how to parse SQL or HTML or CSS and then act on the results to be able to format them for readability or something similar.

Where can I learn how to do this? Are there specific books I can refer to? Do I need to learn about lexers/parsers?

Specifically for the .NET platform since I'm comfortable in C#.

Was it helpful?

Solution

I personally found this article, Grammars and Parsing with C# 2.0, a great introduction on writing lexers/parsers, with examples specifically relating to C#.

I wrote a brief blog post about it not long ago, doing it praise. The nice thing is that it's very much aimed at complete beginners to parse theory (it gives background to the theory as well as implementation), and takes matters in gradual steps. Of course, if you want to proceed to learn the more advanced ideas of the field, you will need various other resources, but I think this is an excellent foundation.

OTHER TIPS

If you do want to learn how to write the parser this might not be your answer, but if you just want to parse and work with the parse results, you should definitively look at Irony.net. It's a toolkit which helps to implement languages (with .NET).

ANTLR :)

its a good way to learn about grammers and parsers

C# has come a long ways since 2.0. The recent addition of expression trees and dynamic typing makes things a lot more interesting for implementing compilers.

Here is a tutorial on how to create an interpreter in C# 4.0 at CodeProject.com.

even tough this may look a bit too much advanced, take a look at monadic parser combinator. There's a great blog post on LukeH's WebLog here:

http://blogs.msdn.com/lukeh/archive/2007/08/19/monadic-parser-combinators-using-c-3-0.aspx

Once you get the basics, it make for very clear parser definitions.

The best book that I've read for learning the idioms of parsing is "Little Languages"

Little Languages on Amazon

If you can get your hands on the .NET source code for System.Text.RegularExpressions, you will also see a real world implementation of how to build a parser.

Justin Rogers has some excellent articles on how to build generic parsers on his blog:

Justin's Blog

And finally, if you want to enter the new world of parsers and grammars, you should really be reading up on 'Oslo' and how to use language M and MGrammar. They will give you a lot of flexibility when it comes to parsing and transforming the resulting object graph into other usable forms.

Justin's articles are probably the easiest and simplest to get up and running with a raw parser that is built atop .NET.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top