Question

Inspired by this question, I'd like to implement a Multiset in Scala. I'd like a MultiSet[A] to:

  • Support adding, removing, union, intersection and difference
  • Be an A => Int, providing the count of each element

Here's one approach, extending Set:

import scala.collection.immutable.Map
import scala.collection.immutable.Set
import scala.collection.SetLike
import scala.collection.mutable.Builder

class MultiSet[A](private val counts: Map[A, Int] = Map.empty[A, Int])
extends SetLike[A, MultiSet[A]] with Set[A] {
  override def +(elem: A): MultiSet[A] = {
    val count = this.counts.getOrElse(elem, 0) + 1
    new MultiSet(this.counts + (elem -> count)) 
  }
  override def -(elem: A): MultiSet[A] = this.counts.get(elem) match {
    case None => this
    case Some(1) => new MultiSet(this.counts - elem)
    case Some(n) => new MultiSet(this.counts + (elem -> (n - 1)))
  }
  override def contains(elem: A): Boolean = this.counts.contains(elem)
  override def empty: MultiSet[A] = new MultiSet[A]
  override def iterator: Iterator[A] = {
    for ((elem, count) <- this.counts.iterator; _ <- 1 to count) yield elem
  }
  override def newBuilder: Builder[A,MultiSet[A]] = new Builder[A, MultiSet[A]] {
    var multiSet = empty
    def +=(elem: A): this.type = {this.multiSet += elem; this}
    def clear(): Unit = this.multiSet = empty
    def result(): MultiSet[A] = this.multiSet
  }
  override def seq: MultiSet[A] = this
}

object MultiSet {
  def empty[A]: MultiSet[A] = new MultiSet[A]
  def apply[A](elem: A, elems: A*): MultiSet[A] = MultiSet.empty + elem ++ elems
  def apply[A](elems: Seq[A]): MultiSet[A] = MultiSet.empty ++ elems
  def apply[A](elem: (A, Int), elems: (A, Int)*) = new MultiSet((elem +: elems).toMap)
  def apply[A](elems: Map[A, Int]): MultiSet[A] = new MultiSet(elems)
}

Extending Set is nice, because it means that MultiSet automatically gets definitions for union, difference, etc. All the following will hold:

// add
assert(
  MultiSet("X" -> 3, "Y" -> 1) + "X" ===
    MultiSet("X" -> 4, "Y" -> 1))
assert(
  MultiSet("X" -> 3, "Y" -> 1) + "Z" ===
    MultiSet("X" -> 3, "Y" -> 1, "Z" -> 1))

// remove
assert(
  MultiSet("a" -> 2, "b" -> 5) - "b" ===
    MultiSet("a" -> 2, "b" -> 4))
assert(
  MultiSet("a" -> 2, "b" -> 5) - "c" ===
    MultiSet("a" -> 2, "b" -> 5))

// add all
assert(
  MultiSet(10 -> 1, 100 -> 3) ++ MultiSet(10 -> 1, 1 -> 7) ===
    MultiSet(100 -> 3, 10 -> 2, 1 -> 7))

// remove all
assert(
  MultiSet("a" -> 2, "b" -> 5) -- MultiSet("a" -> 3) ===
    MultiSet("b" -> 5))

However, I would have to override some inherited methods like union and intersect because they would do the wrong things for multisets, e.g. the following would not hold:

// union (takes max of values)
assert(
  MultiSet(10 -> 5, 1 -> 1).union(MultiSet(10 -> 3, 1 -> 7)) ===
    MultiSet(10 -> 5, 1 -> 7))

// intersection (takes min of values)
assert(
  MultiSet(10 -> 5, 100 -> 3).intersect(MultiSet(10 -> 1, 1 -> 7)) ===
    MultiSet(10 -> 1))

Another problem with extending Set is that then I can't have MultiSet be an A => Int because I'll get the error: illegal inheritance; class MultiSet inherits different type instances of trait Function1: A => Int and A => Boolean. I could work around this by declaring a separate, say, count method, but I'd really prefer that the class just be an A => Int.

Another approach would be to inherit from Map[A, Int], which would give me the A => Int that I want, but then I would have to define all of my own ++, --, etc. since in Map these would be defined over (A, Int) pairs, but for a multiset they need to be defined over As.

I guess a third approach would be to give up on both Set and Map, and just implement a new subclass of Iterable or whatever.

What would you recommend? What's the best way to fit a MulitSet within the Scala collections framework?

Was it helpful?

Solution

However, I would have to override some inherited methods like intersect because they would do the wrong things for multisets

I think you'll have to bite the bullet and do just that.

Another problem with extending Set is that then I can't have MultiSet be an A => Int

Indeed, you cannot inherit twice the same trait (here Function1) with different type parameters. And actually in this case, this is not just an annoying technical limitation, but doing so would in fact not make much sense because when calling apply there would be no way to know which overload you want to call: def apply( key: A ): Boolean or def apply( key: A ): Int? You can't tell as the argument lists are the same.

However, what you could do is add an implicit conversion from MultiSet[A] to A => Int. This way, a MultiSet is treated by default as a A => Boolean (as any set), but can be coerced to a A => Int when nedded (in particular it could be passed directly to a function expecting a A => Int function).

class MultiSet[A] ... {
  ...
  def count(elem: A): Int = counts.getOrElse( elem, 0 )
}
object MultiSet {
  ...
  implicit def toCountFunc[A]( ms: MultiSet[A] ): A => Int = { 
    (x: A) => ms.count( x )
  } 
}

Some test i the REPL:

scala> val ms = MultiSet("a" -> 2, "b" -> 5)
ms: MultiSet[String] = Set(a, a, b, b, b, b, b)
scala> ms("a")
res17: Boolean = true
scala> ms("c")
res18: Boolean = false

scala> def testExists( f: String => Boolean, keys: String *) {
     |   println( keys.map( f ).toList )
     | }
testExists: (f: String => Boolean, keys: String*)Unit
scala> testExists( ms, "a", "c" )
List(true, false)

scala> def testCounts( f: String => Int, keys: String *) {
     |   println( keys.map( f ).toList )
     | }
testCounts: (f: String => Int, keys: String*)Unit
scala> testCounts( ms, "a", "c" )
List(2, 0)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top