Scala Parser Combinators tricks for recursive bnf?
-
30-09-2020 - |
Question
Im trying to match this syntax:
pgm ::= exprs
exprs ::= expr [; exprs]
expr ::= ID | expr . [0-9]+
My scala packrat parser combinator looks like this:
import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._
object Dotter extends StandardTokenParsers with PackratParsers {
lexical.delimiters ++= List(".",";")
def pgm = repsep(expr,";")
def expr :Parser[Any]= ident | expr~"."~num
def num = numericLit
def parse(input: String) =
phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
case Success(result, _) => println("Success!"); Some(result)
case n @ _ => println(n);println("bla"); None
}
def main(args: Array[String]) {
val prg = "x.1.2.3;" +
"y.4.1.1;" +
"z;" +
"n.1.10.30"
parse(prg);
}
}
But this doesnt work. Either it "matches greedy" and tells me:
[1.2] failure: end of input expected
x.1.2.3;y.4.1.1;z;n.1.10.30
or if I change the |
to a |||
I get a stackoverflow:
Exception in thread "main" java.lang.StackOverflowError
at java.lang.Character.isLetter(Unknown Source)
at java.lang.Character.isLetter(Unknown Source)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
...
I kindoff understand why I get the errors; what can I do to parse a syntax like the above? It doesnt seem that esoteric to me
EDIT: Based on the paper referenced in http://scala-programming-language.1934581.n4.nabble.com/Packrat-parser-guidance-td1956908.html I found out that my program didnt actually use the new packrat parser.
Ie. change Parser[Any]
to PackratParser[Any]
and use lazy val
instead of def
I rewrote the above to this:
import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._
object Dotter extends StandardTokenParsers with PackratParsers {
lexical.delimiters ++= List(".",";")
lazy val pgm : PackratParser[Any] = repsep(expr,";")
lazy val expr :PackratParser[Any]= expr~"."~num | ident
lazy val num = numericLit
def parse(input: String) =
phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
case Success(result, _) => println("Success!"); Some(result)
case n @ _ => println(n);println("bla"); None
}
def main(args: Array[String]) {
val prg = "x.1.2.3 ;" +
"y.4.1.1;" +
"z;" +
"n.1.10.30"
parse(prg);
}
}
Solution
The problem is (at least partially) that you're not actually using Packrat parsers. See the documentation for Scala's PackratParsers trait, which says
Using PackratParsers is very similar to using Parsers:
- any class/trait that extends Parsers (directly or through a subclass) can mix in PackratParsers. Example: object MyGrammar extends StandardTokenParsers with PackratParsers
- each grammar production previously declared as a def without formal parameters becomes a lazy val, and its type is changed from Parser[Elem] to PackratParser[Elem]. So, for example, def production: Parser[Int] = {...} becomes lazy val production: PackratParser[Int] = {...}
- Important: using PackratParsers is not an all or nothing decision. They can be free mixed with regular Parsers in a single grammar.
I don't know enough about Scala 2.8's parser combinators to fix this entirely, but with the following modifications, I was able to get it to parse as far as the semicolon, which is an improvement over what you've accomplished.
object Dotter extends StandardTokenParsers with PackratParsers {
lexical.delimiters ++= List(".",";")
lazy val pgm:PackratParser[Any] = repsep(expr,";")
lazy val expr:PackratParser[Any]= ident ||| (expr~"."~numericLit)
def parse(input: String) = phrase(expr)(lex(input)) match {
case Success(result, _) => println("Success!"); Some(result)
case n @ _ => println(n);println("bla"); None
}
def lex(input:String) = new PackratReader(new lexical.Scanner(input))
}
OTHER TIPS
The production
expr ::= ID | expr . [0-9]+
is left recursive. It expands to
expr ::= ID
expr ::= expr . [0-9]+
where the left recursion occurs on the 2nd line. This is what causes the parser to overflow the stack.
You should rewrite your grammar avoiding left recursive productions.
expr ::= ID {. [0-9]+}