Question

I have defined simple grammar for parsing string and number using Treetop as below.

grammar Simple
    rule value
        number / string
    end 

    rule string
        word space string
        /
        word
    end

    rule word
        [0-9a-zA-Z]+
    end

    rule number
        [1-9] [0-9]*
    end

    rule space
        ' '+
    end
end

Ruby:

parser = SimpleParser.new
parser.parse('123abc wer') # => nil

I expect the parser to return string node but look like the parser could not understand the input. Any idea would be appreciated.

Was it helpful?

Solution

In Treetop (and PEGs in general, actually) the choice operator is ordered, unlike most other parsing formalisms.

So, in

rule value
  number / string
end

you are telling Treetop that you prefer number over string.

Your input starts with 1, which matches both number and string (through word), but you told Treetop to prefer the number interpretation, so it parses it as a number. When it comes to the a in the input, it has no more rules to apply, and thus it returns nothing (nil), because in Treetop it is an error to not consume the entire input stream.

If you simply reverse the order of the choice, the entire input will interpreted as a string instead of a number:

SyntaxNode+String0 offset=0, "123abc wer" (word,space,string):
  SyntaxNode offset=0, "123abc":
    SyntaxNode offset=0, "1"
    SyntaxNode offset=1, "2"
    SyntaxNode offset=2, "3"
    SyntaxNode offset=3, "a"
    SyntaxNode offset=4, "b"
    SyntaxNode offset=5, "c"
  SyntaxNode offset=6, " ":
    SyntaxNode offset=6, " "
  SyntaxNode offset=7, "wer":
    SyntaxNode offset=7, "w"
    SyntaxNode offset=8, "e"
    SyntaxNode offset=9, "r"

Or, you could keep the order as it is, but allow the value rule to be matched multiple times. Either insert a new top-level rule like this:

rule values
  value+
end

or modify the value rule like this:

rule value
  (number / string)+
end

Which will give you an AST roughly like this:

SyntaxNode offset=0, "123abc wer":
  SyntaxNode+Number0 offset=0, "123":
    SyntaxNode offset=0, "1"
    SyntaxNode offset=1, "23":
      SyntaxNode offset=1, "2"
      SyntaxNode offset=2, "3"
      SyntaxNode+String0 offset=3, "abc wer" (word,space,string):
        SyntaxNode offset=3, "abc":
          SyntaxNode offset=3, "a"
          SyntaxNode offset=4, "b"
      SyntaxNode offset=5, "c"
    SyntaxNode offset=6, " ":
      SyntaxNode offset=6, " "
    SyntaxNode offset=7, "wer":
      SyntaxNode offset=7, "w"
      SyntaxNode offset=8, "e"
      SyntaxNode offset=9, "r"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top