Question

I am trying to declare a hashmap as Spark (v0.9.1) accumulator. The docs state that "Spark natively supports accumulators of numeric value types and standard mutable collections..." (link).

However, this doesn't seem to be working for me when I try to create a HashMap[String, Boolean]:

scala> import collection.mutable.HashMap
import collection.mutable.HashMap

scala> val accum = sc.accumulator("test" -> true)(HashMap)
<console>:13: error: type mismatch;
 found   : scala.collection.mutable.HashMap.type
 required: org.apache.spark.AccumulatorParam[(String, Boolean)]
           val accum = sc.accumulator("test" -> true)(HashMap)
Was it helpful?

Solution

First of all, you should pass an actual HashMap[String, String] rather than (String, String):

sc.accumulator(HashMap("t" -> true))

And you may need to write your own accumulator, 'cause I didn't find an out-of-the-box implicit for the HashMap:

implicit object iHashMap extends AccumulatorParam[HashMap[String, Boolean]] {
  def zero(m: HashMap[String, String]) = HashMap()
  def addInPlace(m1: HashMap[String, Boolean], m2: HashMap[String, Boolean]) = m1 ++ m2 
}

The behaviour is probably not what you personally would expect, but I hope you catch the gist.

You may also find some really useful examples here: https://github.com/apache/spark/blob/60abc252545ec7a5d59957a32e764cd18f6c16b4/core/src/test/scala/org/apache/spark/AccumulatorSuite.scala

OTHER TIPS

I remember having the same issue, here's a small gist to use a HashMap[String, Int] as accumulator in Spark: HashMapParam.scala

If a key k already exists in the accumulator with a value v1 and we try to put k -> v2 in it, the resulting accumulator will contain k -> v1 + v2.

This does not completely answer your question but could be helpful to build your own implementation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top