Question

So I've been trying to write a calculator with Scala's parser, and it's been fun, except that I found that operator associativity is backwards, and that when I try to make my grammar left-recursive, even though it's completely unambiguous, I get a stack overflow.

To clarify, if I have a rule like: def subtract: Parser[Int] = num ~ "-" ~ add { x => x._1._1 - x._2 } then evaluating 7 - 4 - 3 comes out to be 6 instead of 0.

The way I have actually implemented this is that I am composing a binary tree where operators are non-leaf nodes, and leaf nodes are numbers. The way I evaluate the tree is left child (operator) right child. When constructing the tree for 7 - 4 - 5, what I would like for it to look like is:

-
-   5
7   4   NULL   NULL

where - is the root, its children are - and 5, and the second -'s children are 7 and 4.

However, the only tree I can construct easily is

-
7   -
NULL   NULL   4   5

which is different, and not what I want.

Basically, the easy parenthesization is 7 - (4 - 5) whereas I want (7 - 4) - 5.

How can I hack this? I feel like I should be able to write a calculator with correct operator precedence regardless. Should I tokenize everything first and then reverse my tokens? Is it ok for me to just flip my tree by taking all left children of right children and making them the right child of the right child's parent and making the parent the left child of the ex-right child? It seems good at a first approximation, but I haven't really thought about it too deeply. I feel like there must just be some case that I'm missing.

My impression is that I can only make an LL parser with the scala parsers. If you know another way, please tell me!

Was it helpful?

Solution

Scala's standard implementation of parser combinators (the Parsers trait) do not support left-recursive grammars. You can, however, use PackratParsers if you need left recursion. That said, if your grammar is a simple arithmetic expression parser, you most definitely do not need left recursion.

Edit

There are ways to use right recursion and still keep left associativity, and if you are keen on that, just look up arithmetic expressions and recursive descent parsers. And, of course, as, I said, you can use PackratParsers, which allow left recursion.

But the easiest way to handle associativity without using PackratParsers is to avoid using recursion. Just use one of the repetition operators to get a List, and then foldLeft or foldRight as required. Simple example:

trait Tree
case class Node(op: String, left: Tree, right: Tree) extends Tree
case class Leaf(value: Int) extends Tree

import scala.util.parsing.combinator.RegexParsers

object P extends RegexParsers {
  def expr = term ~ (("+" | "-") ~ term).* ^^ mkTree
  def term = "\\d+".r ^^ (_.toInt)
  def mkTree(input: Int ~ List[String ~ Int]): Tree = input match {
    case first ~ rest => ((Leaf(first): Tree) /: rest)(combine)
  }
  def combine(acc: Tree, next: String ~ Int) = next match {
    case op ~ y => Node(op, acc, Leaf(y))
  }
}

You can find other, more complete, examples on the scala-dist repository.

OTHER TIPS

I'm interpreting your question as follows:

If you write rules like def expression = number ~ "-" ~ expression and then evalute on each node of the syntax tree, then you find that in 3 - 5 - 4, the 5 - 4 is computed first, giving 1 as a result, and then 3 - 1 is computed giving 2 as a result.

On the other hand, if you write rules like def expression = expression ~ "-" ~ number, the rules are left-recursive and overflow the stack.

There are three solutions to this problem:

  1. Post-process the abstract syntax tree to convert it from a right-associative tree to a left-associative tree. If you're using actions on the grammar rules to do the computation immediately, this won't work for you.

  2. Define the rule as def expression = repsep(number, "-") and then when evaluating the computation, loop over the parsed numbers (which will appear in a flat list) in whichever direction provides you the associativity you need. You can't use this if more than one kind of operator will appear, since the operator will be thrown away.

  3. Define the rule as def expression = number ~ ( "-" ~ number) *. You'll have an initial number, plus a set of operator-number pairs in a flat list, to process in any direction you want (though left-to-right is probably easier here).

  4. Use PackratParsers as Daniel Sobral suggested. This is probably your best and simplest choice.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top