How to sum a list of tuples by keys

Question 1

I guess this is the most simple immutable form without usage of any additional framework on top of scala.

UPD Actually forgot about final toList. This makes totally different picture in terms of perfomance, because of the mapValues view return type

You can try foldLeft, tailrec, something mutable and they have better perfomance

import annotation.tailrec

@tailrec
final def tailSum[A](tuples: List[(A, Long)], acc: Map[A, Long] = Map.empty[A, Long]): List[(A, Long)] = tuples match {
  case (k, v) :: tail => tailSum(tail, acc + (k -> (v + acc.get(k).getOrElse(0L))))
  case Nil => acc.toList
}

def foldLeftSum[A](tuples: List[(A, Long)]) = tuples.foldLeft(Map.empty[A, Long])({
  case (acc, (k, v)) => acc + (k -> (v + acc.get(k).getOrElse(0L)))
}).toList

def mutableSum[A](tuples: List[(A, Long)])  = {
  val m = scala.collection.mutable.Map.empty[A, Long].withDefault(_ => 0L)
  for ((k, v) <- tuples) m += (k -> (v + m(k)))
  m.toList
}

Updated perfomance testing is here https://gist.github.com/baskakov/8437895, briefly:

scala>     avgTime("default", sumByKeys(tuples))
default avg time is 63 ms

scala>     avgTime("tailrec", tailSum(tuples))
tailrec avg time is 48 ms

scala>     avgTime("foldleft", foldLeftSum(tuples))
foldleft avg time is 45 ms

scala>     avgTime("mutableSum", mutableSum(tuples))
mutableSum avg time is 41 ms

Question 2

The best I can think of gets you slightly better performance and save two characters:

def sumByKeys[A](tuples: List[(A, Long)]) : List[(A, Long)] = {
  tuples.groupBy(_._1).mapValues(_.unzip._2.sum)
}

On my machine with Bask.ws' benchmark it took 11ms instead of 13ms without the unzip.

EDIT: In fact I think the performance must be the same... don't know where those 2ms come from

Question 3

A solution very similar to yours:

def sumByKeys[A](tuples: List[(A, Long)]): List[(A, Long)] =
  tuples groupBy (_._1) map { case (k, v) => (k, v.map(_._2).sum) } toList

val l: List[(String, Long)] = List(("A", 1), ("B", 2), ("A", 3))

sumByKeys(l)
// result:
// List[(String, Long)] = List((A,4), (B,2))

What's interesting is that in your solution you use def mapValues[C](f: (B) ⇒ C): Map[A, C] which according to docs has "lazy" evaluation: "Transforms this map by applying a function to every retrieved value."

On the other hand def map[B](f: (A) ⇒ B): Map[B] will build a new collection: "Builds a new collection by applying a function to all elements of this immutable map."

So depending on your needs you could be lazily evaluating a large map, or eagerly evaluating a small one.

Question 4

Using reduce,

def sumByKeys[A](tuples: List[(A, Long)]): List[(A, Long)] = {
  tuples groupBy(_._1) map { _._2 reduce { (a,b) => (a._1, a._2+b._2) } } toList
}

a short for

def sumByKeys[A](tuples: List[(A, Long)]): List[(A, Long)] = {
  tuples groupBy(_._1) map { case(k,v) => v reduce { (a,b) => (a._1, a._2+b._2) } } toList
}