BNF grammar + Gold LALR parser, failing to distinguish special case NewLine from Whitespace

https://stackoverflow.com/questions/599409

gold-parser

11-09-2019
|

Question

I want to consider whitespaces and newlines as normal whitespaces.
I want to distinguish newlines from other whitespaces moreover to allow special case.

First attempt to write a compliant grammar fails.

Here is the grammar:

! ------------------------------------------------- Sets

{WS}           = {Whitespace} - {CR} - {LF}
{ID Head}      = {Letter} + [_]
{ID Tail}      = {Alphanumeric} + [_]
{String Chars} = {Printable} + {HT} - ["\]

! ------------------------------------------------- Terminals

! The following defines the Whitespace terminal using the {WS}
! set - which excludes the carriage return and line feed 
! characters

Whitespace    = {WS}+ | {CR}{LF} | {CR} | {LF}
!NewLine       = {CR}{LF} | {CR} | {LF}
MyNewLine      = {CR}{LF} | {CR} | {LF}

Solution

They are ambiguous because they both contain the same sub-set {CR}{LF} | {CR} | {LF}.

Given the input {CR}{LF} the parser has no way to tell which terminal it should match.

A table-driven parser isn't really designed to handle "special cases" directly. If you want to ignore new-lines in some contexts, but attribute meaning to them in others then you'll have to handle that in your reductions (i.e. tokenize the newlines separately, and discard them in your reductions), but that will get ugly.

A (potentially) better solution is to use tokenizer states (possibly controlled from the parser), to change how the newline inputs are tokenized. It's hard to say without fully understanding your grammar. Plus, it's been a few years since I've messed with this stuff.

OTHER TIPS

I think the grammar is ambiguous in the sense that both Whitespace and MyNewLine match new line charachters. Since it throws a wobbly doing it your way, I suggest detecting whitespace and new lines separately and deciding what to do with the newline on a case by case basis.

I am not too experienced in the area, but thats what I remember from my Theory Of Computation class and Compiler Design class.

I hope this helps.

A late answer.

To my dismay, I'm just a recent late bloomer ;-) member.

Keep using the usual Line-Based Grammar Declarations

! ====================================================================
{Whitespace Ch} = {Whitespace} - {CR} - {LF}

Whitespace = {Whitespace Ch}+
Newline    = {CR}{LF} | {CR} | {LF}
! ====================================================================

Whitespace vs. Newline distinction is already taken into account!

Consider addressing your special case when writing your production rules.

For complex case you may even need to define some virtual terminal (advanced technique).

You may elaborate your grammar and ask by posting it again.

Last Edit: Please, share if you've already addressed the issue. Thanks.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow