Frage

If you want to create a pipe with more than 22 fields from a smaller one in Scalding you are limited by Scala tuples, which cannot have more than 22 items.

Is there a way to use collections instead of tuples? I imagine something like in the following example, which sadly doesn't work:

input.read.mapTo('line -> aLotOfFields) { line: String =>
  (1 to 24).map(_.toString)
}.write(output)
War es hilfreich?

Lösung

actually you can. It's in FAQ - https://github.com/twitter/scalding/wiki/Frequently-asked-questions#what-if-i-have-more-than-22-fields-in-my-data-set

val toFields = (1 to 24).map(f => Symbol("field_" + f)).toList

input
  .read
  .mapTo('line -> toFields) { line: String =>
    new Tuple((1 to 24).map(_.toString).map(_.asInstanceOf[AnyRef]): _*)

  }

the last map(_.asInstanceOf[AnyRef]) looks ugly so if you find better solution let me know please.

Andere Tipps

Wrap your tuples into case classes. It will also make your code more readable and type safe than using tuples and collections respectively.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top