Parse a date from a CharSequence with standard patterns…

https://stackoverflow.com/questions/4885738

28-10-2019
|

Frage

I'm writing a parser for a command line interface of an external tool and I'm using Scala's parser combinators library. As part of this I need to parse a standard date of the format EEE MMM d HH:mm:ss yyyy Z.

Scala's parser-combinators are "stream-based" and works with CharSequence's instead of Strings. That makes it hard for me to use either java.text.DateTimeFormat or DateTimeFormat from JodaTime since they both work with Strings.

As of now, I hade to write my own regex-parser like this to parse the date, but I would much rather incorporate the work that has been done with JodaTime into my parser. I really don't want to reinvent the wheel. I've been looking at the source-code of JodaTime and I'm not really sure why it needs to work with Strings instead of just CharSequences. Am I missing some aspect?

Lösung 3

This is my solution right now:

I forked joda-time and made small changes for it to work on CharSequences instead of Strings. It's over here https://github.com/hedefalk/joda-time/commit/ef3bdafd89b334fb052ce0dd192613683b3486a4

Then I could write a DateParser like this:

trait DateParsers extends RegexParsers {
  def dateTime(pattern: String): Parser[DateTime] = new Parser[DateTime] {
    val dateFormat = DateTimeFormat.forPattern(pattern);

    def jodaParse(text: CharSequence, offset: Int) = {
      val mutableDateTime = new MutableDateTime
      val newPos = dateFormat.parseInto(mutableDateTime, text, offset)
      (mutableDateTime.toDateTime, newPos)
    }

    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = handleWhiteSpace(source, offset)
      val (dateTime, endPos) = jodaParse(source, start)
      if (endPos >= 0)
        Success(dateTime, in.drop(endPos - offset))
      else
        Failure("Failed to parse date", in.drop(start - offset))
    }
  }
}

Then I can use this trait to have production rules like:

private[this] def dateRow = "date:" ~> dateTime("EEE MMM d HH:mm:ss yyyy Z")

Am I overworking this? I'm really tired right now…

Andere Tipps

Got it, now. Ok, there's a simpler solution than forking. Here:

trait DateParsers extends RegexParsers {
  def dateTime(pattern: String): Parser[DateTime] = new Parser[DateTime] {
    val dateFormat = DateTimeFormat.forPattern(pattern);

    def jodaParse(text: CharSequence, offset: Int) = {
      val mutableDateTime = new MutableDateTime
      val maxInput = text.source.subSequence(offset, dateFormat.estimateParsedLength + offset).toString
      val newPos = dateFormat.parseInto(mutableDateTime, maxInput, 0)
      (mutableDateTime.toDateTime, newPos + offset)
    }

    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = handleWhiteSpace(source, offset)
      val (dateTime, endPos) = jodaParse(source, start)
      if (endPos >= 0)
        Success(dateTime, in.drop(endPos - offset))
      else
        Failure("Failed to parse date", in.drop(start - offset))
    }
  }
}

I'm not sure what you are asking. Are you asking why RegexParser.parse()'s in parameter takes a CharSequence? If so there's another overloaded RegexParser.parse() that takes a Reader, which you can write a simple conversion function like so:

def stringToReader(str: String): Reader = new StringReader(str)

As to the date format, I find it perfectly fine to define it as a token in the parser.

Hope this helps.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow