Question

When writing a ("theoretical") grammar with a rule with an empty right-hand side, one always use a symbol such as ε (or 1) to make this emptiness explicit:

A → ε | a A

Such a grammar in Yacc and others would then look like

a: | 'a' a

or "worse"

a:       { $$ = new_list(); }
 | a 'a' { $$ = $1; $$->append($1); }
 ;

The fact that in "real world grammars" (Yacc, Bison, etc.) this empty right-hand side part of the rule is not explicitly marked as empty troubles me: it is easy to miss the fact that an rhs is empty, or worse: to forget to insert | and actually use a mid-rule action:

a:       { $$ = new_list(); }
   a 'a' { $$ = $1; $$->append($1); }
 ;

1) I don't know of any tool that provides a means to make empty rhs explicit. Are there any?

Future versions of Bison might support a dedicated symbol, with errors when used in a non-empty rhs, and warnings when a implicitly empty rhs is left.

2) Do people consider this useful?

3) What would be the notation you'd suggest?

Currently, the candidate is $empty:

a: $empty { $$ = new_list(); }
 | a 'a'  { $$ = $1; $$->append($1); }
 ;

EDIT

The chosen syntax is %empty:

a: %empty { $$ = new_list(); }
 | a 'a'  { $$ = $1; $$->append($1); }
 ;

Indeed $empty looks like a pseudo-symbol, such as $accept that Bison generates for the initial rule, or the $@n pseudo-symbols for mid-rule actions, or $eof for, well, end-of-file. But it's definitely not a symbol, it is precisely the absence of symbols.

On the other hand % clearly denotes a directive (some kind of attribute/metadata), like %pred.

So it's a minor difference of syntax, but it's more consistent with the overall syntax. Credit goes to Joel E. Denny.

Was it helpful?

Solution 3

I have used epsilon myself, as well as variations on marker for empty productions to which I attached some code in braces.

A reserved symbol in the bison grammar would be useful; I like the proposed $ prefix to avoid collisions with user-named symbols.

OTHER TIPS

I usually just use a comment:

a: /*epsilon*/ { $$ = new_list(); }
 | a 'a'  { $$ = $1; $$->append($1); }
 ;

Works fine with no changes and makes the intent clear....

IMO, this comes under the heading "If it ain't broke, don't fix it"

I'd suggest the following:

Define the declaration:

%empty ID

whose semantics are two-fold:

1) ID may be used as the only non-rule token in a RHS, to indicate that the RHS is an epsilon production; and

2) epsilon productions not marked with ID are to be considered syntax errors.

Thus, with the declaration:

%empty epsilon

epsilon must be used to mark an empty RHS; without any %empty declaration, the status quo holds, where empty RHSs are unmarked (except perhaps with comments).

That would allow users who like to explicitly mark empty RHSs to do so immediately, without having any impact on existing grammar files or users who don't want to explicitly mark empty RHSs in this fashion.

Personally, I'd probably use such a declaration, although to be honest I'm pretty used to using a comment to mark an empty RHS, and I don't believe I've ever accidentally made an empty RHS. So I wouldn't mark it as a priority feature-request, but nor would I object to its implementation.

1) Well, there is the obvious

e: a 'b'
a: 'a'
 | empty
empty:

2) Yes, that would be very helpful.

3) The $accept, $end and $undefined symbols are always defined, and reserved for Bison's internal use exclusively (eg., they cannot appear in the grammar). Bison generates $@n for mid-rule actions, but these cannot be used in the user's grammar either.

The only predefined token that the user may use in the grammar, if I am not mistaken, is error. So why are you not suggesting empty for that dedicated symbol? That would have seemed fairly reasonable. Or are you suggesting introducing $error as well?

Have you considered nothing? I might rather that.

Of course, the production is in some sense not really "empty" if it contains an action, since it's hard in Yacc/Bison to remain unaware of the fact that actions are transformed into nullable non-terminals behind the scenes. And if you (or the book) have been saying "epsilon" all semester in class, perhaps "%epsilon" has more verisimilitude than "%empty".

I muse about subsuming this into a more general assertion mechanism:

lines : %assert(epsilon)
      | %assert(on WORD) lines line ;

line : WORD '\n' ;

%assert(nullable(lines))
%assert(!nullable(line))
%assert(WORD in FIRST(lines))
/* etc. */

The idea being to slightly decrease the pain of figuring out exactly what language yacc/bison has actually implemented after all the heuristics have kicked in. The rest would work more or less as you specified, an option to warn of "empty" rules, unless the "empty" rule contained %assert(epsilon).

In terms of priorities, I would think it a much higher priority for bison to report when it provably has created a parser that could not possibly accept the input grammar (e.g., one or more productions can never fire). At least, that ability was not there last I looked, but I have a pretty old bison :-). And does it still fail to explain in English the problem of productions with common left prefixes that differ by embedded actions? Unless it's gotten a lot better, I would think there's a lot of explanatory improvement left to do that would help more than a check for unintentionally empty rules.

It would be interesting to see some data on the most common errors students run into (I guess I would not have picked this one as a contender!). That would be kind of an interesting experiment: hack the student copy of bison so it sends every run into a database, use some software to clean it up and analyze the most common misunderstandings.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top