Question

I am having trouble avoiding left-recursion in this simple expression parser I'm working on. Essentially, I want to parse the equation 'f x y' into two expressions 'f x' and '(f x) y' (with implicit parentheses). How can I do this while avoiding left-recursion and backtracking? Does there have to be an intermediate step?

#!/usr/bin/env ruby
require 'rubygems'
require 'treetop'
Treetop.load_from_string DATA.read

parser = ExpressionParser.new

p parser.parse('f x y').value

__END__
grammar Expression
   rule equation
      expression (w+ expression)*
   end
   rule expression
      expression w+ atom
   end
   rule atom
      var / '(' w* expression w* ')'
   end
   rule var
      [a-z]
   end
   rule w
      [\s\n\t\r]
   end
end
Was it helpful?

Solution

You haven't given enough information about your desired result. In particular, do you expect "f(a b) y" to parse as "(f(a(b))) y"? I assume you do... which means that a function not followed by an open parenthesis has arity one.

So you want to say:

rule equation
  expression w* var / expression w* parenthesised_list
end
rule parenthesised_list
  '(' w* ( expression w* )+ ')'
end

If on the other hand you have external (to the grammar) knowledge of the arity of f, and you want to iterate "expression" exactly that many times - as happens in parsing TeX for example - then you will need to use a semantic predicate &{|s| ...} inside the iterated expression list). Beware that the argument passed to the block of a sempred is not a SyntaxNode (which cannot yet be constructed because this sequence sub-rule has not yet succeeded) but the accumulated array of nodes so far in the sequence. The truthiness of the block return value dictates the parse result and can stop the iteration.

Another tool you might consider using is lookahead (!stuff_I_dont_expect_to_follow or &stuff_that_must_follow).

You can also ask such questions in http://groups.google.com/group/treetop-dev

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top