Question

I have a simple file in rascal for specifying a toy grammar

module temp

import IO;

import ParseTree;

layout LAYOUT = [\t-\n\r\ ]*;

start syntax Simple 
  =  A B ;

syntax A = "Hello"+ ("joe" "pok")* ;
syntax A= "Hi";
syntax B = "world"*|"wembly";
syntax B =    C | C C*   ;


public void main () {
println("hello");
iprint(parse(#start[Simple], "Hello Hello world world world"));
}

This works fine, however, the problem is that I didn't want to write

syntax B =    C | C C*   ;

I wanted to write

syntax B =  (  C | C C*  )? 

but it was rejected as a parse error by rascal -even though all of

syntax B =  (  C  C C*  )? ;

syntax B =  (  C |  C*  )? ;

syntax B =    C | C C*   ;

are accepted fine. Can anyone explain to me what I'm doing wrong?

Was it helpful?

Solution

The sequence symbol (nested sequence) always requires brackets in rascal. The meta notation is defined as

syntax Sym = sequence: "(" Sym+ ")" | opt: Sym "?" | alternative: "(" Sym "|" {Sym "|"}+ ")" | ... ;

So, in your example you should have written:

syntax B = (C | (C C*))?;

What is perhaps confusing is that Rascal uses the | sign twice. Once for separating top-level alternatives, once for nested alternative:

syntax X = "a" | "b"; // top-level
syntax Y = ("c" | "d"); // nested, will internally generate a new rule: 
syntax ("c" | "d") = "c" | "d";

Finally, normal alternatives have sequences without brackets, as in:

syntax B 
  = C
  | C C*
  ;
// or less abstractly:
syntax Exp = left Exp "*" Exp
           > left Exp "+" Exp
           ;

BTW, we generally avoid the use of too many nested regular expressions because they are so anonymous and therefore make interpreting parse trees harder. The best usage of regular expressions is for expressing lexical syntax where we are not so much interested in the internal structure anyhow.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top