Question

I have a file containing a text representation of an object. I have written a combinator parser grammar that parses the text and returns the object. In the text, "#" is a comment delimiter: everything from that character to the end of the line is ignored. Blank lines are also ignored. I want to process text one line at a time, so that I can handle very large files.

I don't want to clutter up my parser grammar with generic comment and blank line logic. I'd like to remove these as a preprocessing step. Converting the file to an iterator over line I can do something like this:

Source.fromFile("file.txt").getLines.map(_.replaceAll("#.*", "").trim).filter(!_.isEmpty)

How can I pass the output of an expression like that into a combinator parser? I can't figure out how to create a Reader object out of a filtered expression like this. The Java FileReader interface doesn't work that way.

Is there a way to do this, or should I put my comment and blank line logic in the parser grammar? If the latter, is there some util.parsing package that already does this for me?

Was it helpful?

Solution

The simplest way to do this is to use the fromLines method on PagedSeq:

import scala.collection.immutable.PagedSeq
import scala.io.Source
import scala.util.parsing.input.PagedSeqReader

val lines = Source.fromFile("file.txt").getLines.map(
  _.replaceAll("#.*", "").trim
).filterNot(_.isEmpty)

val reader = new PagedSeqReader(PagedSeq.fromLines(lines))

And now you've got a scala.util.parsing.input.Reader that you can plug into your parser. This is essentially what happens when you parse a java.io.Reader, anyway—it immediately gets wrapped in a PagedSeqReader.

OTHER TIPS

Not the prettiest code you'll ever write, but you could go through a new Source as follows:

val SEP = System.getProperty("line.separator")
def lineMap(fileName : String, trans : String=>String) : Source = {
  Source.fromIterable(
    Source.fromFile(fileName).getLines.flatMap(
      line => trans(line) + SEP
    ).toIterable
  )
}

Explanation: flatMap will produce an iterator on characters, which you can turn into an Iterable, which you can use to build a new Source. You need the extra SEP because getLines removes it by default (using \n may not work as Source will not properly separate the lines).

If you want to apply filtering too, i.e. remove some of the lines, you could for instance try:

// whenever `trans` returns `None`, the line is dropped.
def lineMapFilter(fileName : String, trans : String=>Option[String]) : Source = {
  Source.fromIterable(
    Source.fromFile(fileName).getLines.flatMap(
      line => trans(line).map(_ + SEP).getOrElse("")
    ).toIterable
  )
}

As an example:

lineMapFilter("in.txt", line => if(line.isEmpty) None else Some(line.reverse))

...will remove empty lines and reverse non-empty ones.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top