Question

I'm evaluating using Coco/R vs. ANTLR for use in a C# project as part of what's essentially a scriptable mail-merge functionality. To parse the (simple) scripts, I'll need a parser.

I've focussed on Coco/R and ANTLR because both seem fairly mature and well-maintained and capable of generating decent C# parsers.

Neither seem to be trivial to use either, however, and simplicity is something I'd appreciate - particularly maintainability by others.

Does anyone have any recommendations to make? What are the pros/cons of either for a parsing a small language - or am I looking into the wrong things entirely? How well do these integrate into a typical continuous integration setup? What are the pitfalls?

Related: Well, many questions, such as 1, 2, 3, 4, 5.

Was it helpful?

Solution

If you're simply merging data into a complicated template, consider Terence Parr's StringTemplate engine. He's the man behind ANTLR. StringTemplate may be better suited and easier to use than a full parser generator. It's a very feature-rich template engine.

There is a C# port available in the downloads.

OTHER TIPS

We have used Coco for 2 years, having replaced Antler we were formerly using. For a typical big-data query (our application), our experience has been this. Caveat: We are dependent upon full Utf-8 handling, with the parser implemented in C++. These numbers are for a language that has some 200 EBNF productions.

  • Antler: 260 usecs/query and a 108 MEGABYTE memory footprint for the generated parser/lexer
  • Coco: 220 usecs/query and a 70 KBYTE memory footprint for the parser/scanner

Initially, Coco had a 1.2 msecs startup time and generated several 60 KBYTE tables for mapping Utf-8. We have made many local enhancements to Coco, such as to eliminate the big tables, eliminated the 1.2 msec startup time, hugely enhanced internal documentation (as well as documentation in the generated code).

Our version of (open source) Coco has a tiny footprint compared to Antlr and is very measurably faster, has no startup delay and just... works. It does not have Antler's nice UI but that never entered our mind to be an issue once we started using Coco.

ANTLR is LL(*), which is as powerful as PEG, though usually much more efficient and flexible. LL(*) degenerates to LL(k) for k>1 one arbitrary lookahead is not necessary.

Basically, coco/r generates recursive descent parsers and only supports LL(1) grammars whereas ANTLR uses back-tracking (among other techniques), which allows it to handle more complex grammars. coco/r parsers are much more light-weight and easier to understand and deploy but sometimes it's a struggle getting the grammar into a form that coco/r understands given its one look-ahead constraint - for many common programming language grammars (e.g. C++, SQL), it's not possible at all.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top