Pergunta

If you want to create a pipe with more than 22 fields from a smaller one in Scalding you are limited by Scala tuples, which cannot have more than 22 items.

Is there a way to use collections instead of tuples? I imagine something like in the following example, which sadly doesn't work:

input.read.mapTo('line -> aLotOfFields) { line: String =>
  (1 to 24).map(_.toString)
}.write(output)
Foi útil?

Solução

actually you can. It's in FAQ - https://github.com/twitter/scalding/wiki/Frequently-asked-questions#what-if-i-have-more-than-22-fields-in-my-data-set

val toFields = (1 to 24).map(f => Symbol("field_" + f)).toList

input
  .read
  .mapTo('line -> toFields) { line: String =>
    new Tuple((1 to 24).map(_.toString).map(_.asInstanceOf[AnyRef]): _*)

  }

the last map(_.asInstanceOf[AnyRef]) looks ugly so if you find better solution let me know please.

Outras dicas

Wrap your tuples into case classes. It will also make your code more readable and type safe than using tuples and collections respectively.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top