Question

The problem

I'll start with a simplified parsing problem. Suppose I've got a list of strings that I want to parse into a list of integers, and that I want to accumulate errors. This is pretty easy in Scalaz 7:

val lines = List("12", "13", "13a", "14", "foo")

def parseLines(lines: List[String]) = lines.traverseU(_.parseInt.toValidationNel)

We can confirm that it works as expected:

scala> parseLines(lines).fold(_.foreach(t => println(t.getMessage)), println)
For input string: "13a"
For input string: "foo"

This is nice, but suppose the list is pretty long, and that I decide I want to capture more information about the context of the errors in order to make clean-up easier. For the sake of simplicity I'll just use (zero-indexed) line numbers here to represent the position, but the context could also include a file name or other information.

Passing around position

One simple approach is to pass the position to my line parser:

type Position = Int

case class InvalidLine(pos: Position, message: String) extends Throwable(
  f"At $pos%d: $message%s"
)

def parseLine(line: String, pos: Position) = line.parseInt.leftMap(
  _ => InvalidLine(pos, f"$line%s is not an integer!")
)

def parseLines(lines: List[String]) = lines.zipWithIndex.traverseU(
  (parseLine _).tupled andThen (_.toValidationNel)
)

This also works:

scala> parseLines(lines).fold(_.foreach(t => println(t.getMessage)), println)
At 2: 13a is not an integer!
At 4: foo is not an integer!

But in more complex situations passing the position around like this gets unpleasant.

Wrapping errors

Another option is to wrap the error produced by the line parser:

case class InvalidLine(pos: Position, underlying: Throwable) extends Throwable(
  f"At $pos%d: ${underlying.getMessage}%s",
  underlying
)

def parseLines(lines: List[String]) = lines.zipWithIndex.traverseU {
  case (line, pos) => line.parseInt.leftMap(InvalidLine(pos, _)).toValidationNel
}

And again, it works just fine:

scala> parseLines(lines).fold(_.foreach(t => println(t.getMessage)), println)
At 2: For input string: "13a"
At 4: For input string: "foo"

But sometimes I have a nice error ADT and this kind of wrapping doesn't feel particularly elegant.

Returning "partial" errors

A third approach would be to have my line parser return a partial error that needs to be combined with some additional information (the position, in this case). I'll use Reader here, but we could just as well represent the failure type as something like Position => Throwable. We can reuse our first (non-wrapping) InvalidLine above.

def parseLine(line: String) = line.parseInt.leftMap(
  error => Reader(InvalidLine((_: Position), error.getMessage))
)

def parseLines(lines: List[String]) = lines.zipWithIndex.traverseU {
  case (line, pos) => parseLine(line).leftMap(_.run(pos)).toValidationNel
}

Once again this produces the desired output, but also feels kind of verbose and clunky.

Question

I come across this kind of problem all the time—I'm parsing some messy data and want nice helpful error messages, but I also don't want to thread a bunch of position information through all of my parsing logic.

Is there some reason to prefer one of the approaches above? Are there better approaches?

No correct solution

OTHER TIPS

I use a combination of your first and second options with locally-requested stackless exceptions for control flow. This is the best thing I've found to keep error handling both completely bulletproof and mostly out of the way. The basic form looks like this:

Ok.or[InvalidLine]{ bad =>
  if (somethingWentWrong) bad(InvalidLine(x))
  else y.parse(bad)  // Parsers should know about sending back info!
}

where bad throws an exception when called that returns the data passed to it, and the output is a custom Either-like type. If it becomes important to inject additional context from the outer scope, adding an extra transformer step is all it takes to add context:

Ok.or[Invalid].explain(i: InvalidLine => Invalid(i, myFile)) { bad =>
  // Parsing logic
}

Actually creating the classes to make this work is a little more fiddly than I want to post here (especially since there are additional considerations in all of my actual working code which obscures the details), but this is the logic.

Oh, and since this ends up just being an apply method on a class, you can always

val validate = Ok.or[Invalid].explain(/* blah */)

validate { bad => parseA }
validate { bad => parseB }

and all the usual tricks.

(I suppose it's not fully obvious that the type signature of bad is bad: InvalidLine => Nothing, and the type signature of the apply is (InvalidLine => Nothing) => T.)

An overly-simplified solution could be:

import scala.util.{Try, Success, Failure}

def parseLines(lines: List[String]): List[Try[Int]] =
  lines map { l => Try (l.toInt) }

val lines = List("12", "13", "13a", "14", "foo")
println("LINES: " + lines)

val parsedLines = parseLines(lines)
println("PARSED: " + parsedLines)

val anyFailed: Boolean = parsedLines.exists(_.isFailure)
println("FAILURES EXIST?: " + anyFailed)

val failures: List[Throwable] = parsedLines.filter(_.isFailure).map{ case Failure(e) => e }
println("FAILURES: " + failures)

val parsedWithIndex = parsedLines.zipWithIndex
println("PARSED LINES WITH INDEX: " + parsedWithIndex)

val failuresWithIndex = parsedWithIndex.filter{ case (v, i) => v.isFailure }
println("FAILURES WITH INDEX: " + failuresWithIndex)

Prints:

LINES: List(12, 13, 13a, 14, foo)

PARSED: List(Success(12), Success(13), Failure(java.lang.NumberFormatException: For input string: "13a"), Success(14), Failure(java.lang.NumberFormatException: For input string: "foo"))

FAILURES EXIST?: true

FAILURES: List(java.lang.NumberFormatException: For input string: "13a", java.lang.NumberFormatException: For input string: "foo")

PARSED LINES WITH INDEX: List((Success(12),0), (Success(13),1), (Failure(java.lang.NumberFormatException: For input string: "13a"),2), (Success(14),3), (Failure(java.lang.NumberFormatException: For input string: "foo"),4))

FAILURES WITH INDEX: List((Failure(java.lang.NumberFormatException: For input string: "13a"),2), (Failure(java.lang.NumberFormatException: For input string: "foo"),4))

Given that you could wrap all this in a helper class, abstract parsing function, generalize input and output types and even define error type whether it's an exception or something else.

What I'm suggesting is a simple map based approach, the exact types could be defined based on a task.

The annoying thing is that you have to keep a reference to parsedWithIndex in order to be able to get indexes and exceptions unless your exceptions will hold indexes and other context information.

Example of an implementation:

case class Transformer[From, To](input: List[From], f: From => To) {
  import scala.util.{Try, Success, Failure}

  lazy val transformedWithIndex: List[(Try[To], Int)] =
    input map { l => Try ( f(l) ) } zipWithIndex

  def failuresWithIndex =
    transformedWithIndex.filter { case (v, i) => v.isFailure }

  lazy val failuresExist: Boolean =
    ! failuresWithIndex.isEmpty

  def successfulOnly: List[To] =
    for {
      (e, _) <- transformedWithIndex
      value <- e.toOption
    } yield value
}

val lines = List("12", "13", "13a", "14", "foo")

val res = Transformer(lines, (l: String) => l.toInt)

println("FAILURES EXIST?: " + res.failuresExist)
println("PARSED LINES WITH INDEX: " + res.transformedWithIndex)
println("SUCCESSFUL ONLY: " + res.successfulOnly)

Prints:

FAILURES EXIST?: true

PARSED LINES WITH INDEX: List((Success(12),0), (Success(13),1), (Failure(java.lang.NumberFormatException: For input string: "13a"),2), (Success(14),3), (Failure(java.lang.NumberFormatException: For input string: "foo"),4))

SUCCESSFUL ONLY: List(12, 13, 14)

Try can be replaced with Either or your own custom Failure.

This does feel a bit more object oriented rather than functional.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top