Domanda

I am trying to declare a hashmap as Spark (v0.9.1) accumulator. The docs state that "Spark natively supports accumulators of numeric value types and standard mutable collections..." (link).

However, this doesn't seem to be working for me when I try to create a HashMap[String, Boolean]:

scala> import collection.mutable.HashMap
import collection.mutable.HashMap

scala> val accum = sc.accumulator("test" -> true)(HashMap)
<console>:13: error: type mismatch;
 found   : scala.collection.mutable.HashMap.type
 required: org.apache.spark.AccumulatorParam[(String, Boolean)]
           val accum = sc.accumulator("test" -> true)(HashMap)
È stato utile?

Soluzione

First of all, you should pass an actual HashMap[String, String] rather than (String, String):

sc.accumulator(HashMap("t" -> true))

And you may need to write your own accumulator, 'cause I didn't find an out-of-the-box implicit for the HashMap:

implicit object iHashMap extends AccumulatorParam[HashMap[String, Boolean]] {
  def zero(m: HashMap[String, String]) = HashMap()
  def addInPlace(m1: HashMap[String, Boolean], m2: HashMap[String, Boolean]) = m1 ++ m2 
}

The behaviour is probably not what you personally would expect, but I hope you catch the gist.

You may also find some really useful examples here: https://github.com/apache/spark/blob/60abc252545ec7a5d59957a32e764cd18f6c16b4/core/src/test/scala/org/apache/spark/AccumulatorSuite.scala

Altri suggerimenti

I remember having the same issue, here's a small gist to use a HashMap[String, Int] as accumulator in Spark: HashMapParam.scala

If a key k already exists in the accumulator with a value v1 and we try to put k -> v2 in it, the resulting accumulator will contain k -> v1 + v2.

This does not completely answer your question but could be helpful to build your own implementation.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top