Pergunta

I am trying to declare a hashmap as Spark (v0.9.1) accumulator. The docs state that "Spark natively supports accumulators of numeric value types and standard mutable collections..." (link).

However, this doesn't seem to be working for me when I try to create a HashMap[String, Boolean]:

scala> import collection.mutable.HashMap
import collection.mutable.HashMap

scala> val accum = sc.accumulator("test" -> true)(HashMap)
<console>:13: error: type mismatch;
 found   : scala.collection.mutable.HashMap.type
 required: org.apache.spark.AccumulatorParam[(String, Boolean)]
           val accum = sc.accumulator("test" -> true)(HashMap)
Foi útil?

Solução

First of all, you should pass an actual HashMap[String, String] rather than (String, String):

sc.accumulator(HashMap("t" -> true))

And you may need to write your own accumulator, 'cause I didn't find an out-of-the-box implicit for the HashMap:

implicit object iHashMap extends AccumulatorParam[HashMap[String, Boolean]] {
  def zero(m: HashMap[String, String]) = HashMap()
  def addInPlace(m1: HashMap[String, Boolean], m2: HashMap[String, Boolean]) = m1 ++ m2 
}

The behaviour is probably not what you personally would expect, but I hope you catch the gist.

You may also find some really useful examples here: https://github.com/apache/spark/blob/60abc252545ec7a5d59957a32e764cd18f6c16b4/core/src/test/scala/org/apache/spark/AccumulatorSuite.scala

Outras dicas

I remember having the same issue, here's a small gist to use a HashMap[String, Int] as accumulator in Spark: HashMapParam.scala

If a key k already exists in the accumulator with a value v1 and we try to put k -> v2 in it, the resulting accumulator will contain k -> v1 + v2.

This does not completely answer your question but could be helpful to build your own implementation.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top