Scala parser combinators converting list of characters to strings

Question

The problem is that anyChar* parses a List[String] (where in this case each string is a single character), and the result of calling toString on a list of strings is "List(...)", not the string you'd get by concatenating the contents. In addition, the case text => pattern is matching on the entire letter ~ (anyChar*), not just the anyChar* part.

It's possible to address both of these issues pretty straightforwardly:

case class Model(name: String) {
  override def toString : String = "[model " + name + "]"
}

import scala.util.parsing.combinator._

object ModelParser extends RegexParsers {
  def model: Parser[Model] = "[model" ~> "[name" ~> name <~ "]]" ^^ (Model(_))

  def name: Parser[String] = letter ~ (anyChar*) ^^ {
    case first ~ rest => (first :: rest).mkString
  }

  def anyChar = letter | digit | "_".r | "-".r
  def letter = """[a-zA-Z]""".r
  def digit = """\d""".r
}

We just append the first character string to the list of the rest, and then call mkString on the entire list, which will concatenate the contents. This works as expected:

scala> ModelParser.parseAll(ModelParser.model, "[model [name helloWorld]]")
res0: ModelParser.ParseResult[Model] = [1.26] parsed: [model helloWorld]

As you note, it would be possible (and possibly clearer and more performant) to let the regular expressions do more of the work:

object ModelParser extends RegexParsers {
  def model: Parser[Model] = "[model" ~> "[name" ~> name <~ "]]" ^^ (Model(_))

  def name: Parser[String] = """[a-zA-Z\d_-]+""".r
}

This example also illustrates the way that the parsing combinator library uses implicit conversions to cut down on some of the verbosity of writing parsers. As you say, def hello = "hello" defines a string, and "[a-zA-Z]+".r defines a Regex (via the r method on StringOps), but either can be used as a parser because RegexParsers defines implicit conversions from String (this one's named literal) and Regex (regex) to Parser[String].