- You could do it with one mapping instead of two.
- There's no reason for
unzip
, which generates two collections, when you only need one half of its result. Just doing.map(_._2)
should get you what you need. - Perhaps more importantly, if you plan on accessing the values repeatedly, you would want to use
map
instead ofmapValues
becausemapValues
only creates a view of the new map, meaning that the averaging would be recomputed on every access. - Finally, you don't want to declare the return type as
Iterable
since then you lose the fact thatgroupBy
gives you aMap
.
So maybe you want this:
def groupByAndAvg[T, U](ts: Iterable[(T, U)])(implicit num: Numeric[U]) = {
ts.groupBy(_._1).map {
case (key, pairs) =>
val values = pairs.map(_._2)
key -> (num.toDouble(values.sum) / values.size)
}
}
groupByAndAvg(Vector(("a", 1), ("a", 2), ("b", 3)))
// res0: scala.collection.immutable.Map[String,Double] = Map(b -> 3.0, a -> 1.5)
Implementing stuff yourself version:
If you do stuff like this a lot, you can define your own collection methods. Here I define groupByKey
, which takes a Traverable[(K,V)]
and returns a Map[K,Traverable[V]]
, and avg
:
import scala.collection.TraversableLike
import scala.collection.generic.CanBuildFrom
import scala.collection.immutable
import scala.collection.mutable
implicit class EnrichedWithGroupByKey[K, V, Repr](val self: TraversableLike[(K, V), Repr]) extends AnyVal {
def groupByKey[That](implicit bf: CanBuildFrom[Repr, V, That]): Map[K, That] = {
val m = mutable.Map.empty[K, mutable.Builder[V, That]]
for ((key, value) <- self) {
val bldr = m.getOrElseUpdate(key, bf(self.asInstanceOf[Repr]))
bldr += value
}
val b = immutable.Map.newBuilder[K, That]
for ((k, v) <- m)
b += (k -> v.result)
b.result
}
}
implicit class EnrichedWithAvg[A](val self: Traversable[A])(implicit num: Numeric[A]) {
def avg = {
assert(self.nonEmpty, "cannot average an empty collection")
num.toDouble(self.sum) / self.size
}
}
Then you can just do:
def groupByAndAvg[T, U](ts: Iterable[(T, U)])(implicit num: Numeric[U]) = {
ts.groupByKey.map{ case (k,vs) => k -> vs.avg }
}
groupByAndAvg(Vector(("a", 1), ("a", 2), ("b", 3)))
// res0: scala.collection.immutable.Map[String,Double] = Map(b -> 3.0, a -> 1.5)