سؤال

I need a method that group by and average on a list of tuples.

Here is my current implementation :

  // (a, 1), (a, 2), (b, 3) -> (a, 1.5), (b, 3)
  def groupByAndAvg[T, U](ts: Iterable[(T, U)])(implicit num: Numeric[U]): Iterable[(T, Double)] = {
    ts.groupBy(_._1)
      .mapValues { _.unzip._2 }
      .mapValues { xs => num.toDouble(xs.sum) / xs.size }
  }

Is there a better way in term of perf or simplicity ?

هل كانت مفيدة؟

المحلول

  1. You could do it with one mapping instead of two.
  2. There's no reason for unzip, which generates two collections, when you only need one half of its result. Just doing .map(_._2) should get you what you need.
  3. Perhaps more importantly, if you plan on accessing the values repeatedly, you would want to use map instead of mapValues because mapValues only creates a view of the new map, meaning that the averaging would be recomputed on every access.
  4. Finally, you don't want to declare the return type as Iterable since then you lose the fact that groupBy gives you a Map.

So maybe you want this:

def groupByAndAvg[T, U](ts: Iterable[(T, U)])(implicit num: Numeric[U]) = {
  ts.groupBy(_._1).map { 
    case (key, pairs) =>
      val values = pairs.map(_._2)
      key -> (num.toDouble(values.sum) / values.size)
  }
}

groupByAndAvg(Vector(("a", 1), ("a", 2), ("b", 3)))
// res0: scala.collection.immutable.Map[String,Double] = Map(b -> 3.0, a -> 1.5)

Implementing stuff yourself version:

If you do stuff like this a lot, you can define your own collection methods. Here I define groupByKey, which takes a Traverable[(K,V)] and returns a Map[K,Traverable[V]], and avg:

import scala.collection.TraversableLike
import scala.collection.generic.CanBuildFrom
import scala.collection.immutable
import scala.collection.mutable

implicit class EnrichedWithGroupByKey[K, V, Repr](val self: TraversableLike[(K, V), Repr]) extends AnyVal {
  def groupByKey[That](implicit bf: CanBuildFrom[Repr, V, That]): Map[K, That] = {
    val m = mutable.Map.empty[K, mutable.Builder[V, That]]
    for ((key, value) <- self) {
      val bldr = m.getOrElseUpdate(key, bf(self.asInstanceOf[Repr]))
      bldr += value
    }
    val b = immutable.Map.newBuilder[K, That]
    for ((k, v) <- m)
      b += (k -> v.result)
    b.result
  }
}

implicit class EnrichedWithAvg[A](val self: Traversable[A])(implicit num: Numeric[A]) {
  def avg = {
    assert(self.nonEmpty, "cannot average an empty collection")
    num.toDouble(self.sum) / self.size
  }
}

Then you can just do:

def groupByAndAvg[T, U](ts: Iterable[(T, U)])(implicit num: Numeric[U]) = {
  ts.groupByKey.map{ case (k,vs) => k -> vs.avg }
}

groupByAndAvg(Vector(("a", 1), ("a", 2), ("b", 3)))
// res0: scala.collection.immutable.Map[String,Double] = Map(b -> 3.0, a -> 1.5)

نصائح أخرى

Just another way to do it (similar):

def groupByAndAvg[T, U](ts: Iterable[(T, U)])(implicit num: Numeric[U]): Iterable[(T, Double)] = {
    ts.groupBy(_._1)
      .map { case (k, v) =>
        (k, v.map(_._2).sum / v.size) }
  }
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top