Question

I read this fantastic post on the comp.theory list:

http://coding.derkeiler.com/Archive/General/comp.theory/2004-03/0189.html

The poster makes the point that most programming languages define a context-free core, and then have additional algorithms which run on the parse tree to filter out constructs that are illegal in the language:

This separates out the context-free part of the language from the context-sensitive part -- which is generally regarded as good practice (a kind of modular "programming" discipline for language design).

Can you provide a "Hello World" example to illustrate this technique? That is, provde a simple context-sensitive language, identify the context-free core, and then sketch out how to parse an input using the context-free core followed by filtering out illegal constructs in the parse tree.

Can you refer me to any articles or books that discuss this technique?

Was it helpful?

Solution

One of the simplest languages that are not context-free is one where the words are of the type anbncn (a, b, and c repeated the same number of times, that is, abc, aabbcc, aaabbbccc, ...).

You can parse it using a grammar for the context-free language {anbncm}, where the number of c's is not restricted. Once you have the parse tree, you check using a separate algorithm that the number of repetitions of c is equal to the number of repetitions of a and b.

OTHER TIPS

Generally filtering is done also to disambiguate over-approximations of languages. We write ambiguous but clear context-free grammars for programming languages, then use tree walkers or other mechanisms to remove the unwanted derivations.

one reference:

One the other hand, you could also consider a type checker that processes abstract syntax trees as such a filter. Type checkers reject trees produced by a parser based on non-local (context) information. For example:

1 + "1"

is accepted by the grammar because:

E ::= Int | String | E "+" E;

but the type checker says that addition does not work between integers and strings and rejects the sentence from the language. The type checker does this by traversing the tree after parsing and identifying the addition symbol, then possibly looking up valid combinations of operands in a table and if the combination is not a valid it starts complaining. I guess that is typically how compilers work. See the Aho et al. dragon book. It sounds more interesting if you talk about it abstractly :-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top