Question

I'm writing some Extended Backus–Naur Form grammars for document parsing. There are lots of excellent guides for the syntax of these definitions, but very little online about how to design and structure them.

Can anyone suggest good articles (or general tips) about how you like to approach writing these as there does seem to be an element of style even if the final parse trees can be equivalent.

e.g. things like:

  • Deciding if you should explicitly tag newlines, or just treat it as whitespace?
  • Naming schemes for your nonterminals
  • Handing optional whitespace in long definitions
  • When to use bad syntax checks vs just letting those not match

Thanks,

Was it helpful?

Solution

You should work in the direction that you are most comfortable with - either bottom-up, top-down, or "sandwich" (do a little of both, meet somewhere in the middle).
Any "group" that can be derived and has a meaning of its own, should start from it's own non-terminal. So for example, I would use a non-terminal for all newline-related whitespaces, one for all the other whitespaces, and one for all whitespaces (which is basically the union of the former 2).

Naming conventions in grammars in general are that non-terminals are, or start with, a capital letter, and terminals start with non-capitals (but this of course depends on the language you're designing).

Regarding bad syntax checks, I'm not familiar with the concept. What I know of EBNFs are that you just write everything your language accepts, and only that.

Generally, just look around at some EBNFs of different languages from different websites, get a feeling of how they look, and then do what feels right to you.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top