Parser doesn't stop parsing arbitrary newlines and whitespaces

https://stackoverflow.com/questions/22876099

28-06-2023
|

Question

So, I'm trying to write a very basic Lisp parser in pegjs and I got it to spit out the same code as long as the Lisp code was syntactically valid and fit on one line.

I wish to extend the parser to be able to accept any newline character inserted anywhere along with extra whitespace in the code.

So here's the code that works as long as everything is on one line:

Start
  = List

Character
  = [^\n" ""("")"]

LeftParenthesis
  = "("

RightParenthesis
  = ")"

WhiteSpace
  = " "

NewLine
  = "\n"

Token
  = token:Character+{return token.join("");}

Tuple
  = left:Token WhiteSpace+ right:List?{
        return left.concat([" "]).concat(right);
    }
  / Token

List
  = left:LeftParenthesis tuple:Tuple right:RightParenthesis{
        return left.concat(tuple).concat(right);
    }
  / Tuple

Then, in my attempt to allow for newlines and whitespaces, I tried changing the rule for "Tuple" to

Tuple
  = left:Token WhiteSpace+ (NewLine* WhiteSpace*)* right:List?{
        return left.concat([" "]).concat(right);
    }
  / Token

But this change causes pegjs to go into an infinite loop, although the addition to the rule is seemingly non-recursive.

Note: In case it's unclear what I'm trying to do, I'm writing a grammar such that pegjs spits out a parser that parses

(f x 
  (g y 
    (h z t)))

and spits out either the same code as a string or just

"(f x (g y (h z t)))"

Either works for me.

What my current working grammar does is take

(f x (g y (h z t)))

and output

"(f x (g y (h z t)))"

While it is trivial to allow for just one newline character after every "Token" or "Tuple", I wish to accept the following as legal code:

(f x

   (g     y (

       h z t) ) )

Solution

Turns out this question is very similar to Ignore whitespace with PEG.js

It's all about ignoring whitespaces and newlines. I actually don't know why the above code fails, but after playing around, I managed to get pegjs to do what I want

Start
  = List

Character
  = [^ \t\r\n"("")"]

LeftParenthesis
  = "("

RightParenthesis
  = ")"

Separator
  = [ \t\r\n]

Token
  = List
  / token:Character+{return token.join("");}

Tuple
  = first:Token second:Separator+ rest:Tuple*{
        return rest.length > 0 ? first.concat([" "]).concat(rest) : first;
    }
  / Token

List
  = left:LeftParenthesis Separator* token:Tuple Separator* right:RightParenthesis{
        return [left].concat(token).concat([right]).join("");
     }

So, when you give the following string to parse:

(f x

   (g     y (

       h z t) ) )

The parser outputs

"(f x (g y (h z t)))"

which is exactly what I wanted

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow