Inspired by this question, I'd like to implement a Multiset in Scala. I'd like a MultiSet[A]
to:
- Support adding, removing, union, intersection and difference
- Be an
A => Int
, providing the count of each element
Here's one approach, extending Set
:
import scala.collection.immutable.Map
import scala.collection.immutable.Set
import scala.collection.SetLike
import scala.collection.mutable.Builder
class MultiSet[A](private val counts: Map[A, Int] = Map.empty[A, Int])
extends SetLike[A, MultiSet[A]] with Set[A] {
override def +(elem: A): MultiSet[A] = {
val count = this.counts.getOrElse(elem, 0) + 1
new MultiSet(this.counts + (elem -> count))
}
override def -(elem: A): MultiSet[A] = this.counts.get(elem) match {
case None => this
case Some(1) => new MultiSet(this.counts - elem)
case Some(n) => new MultiSet(this.counts + (elem -> (n - 1)))
}
override def contains(elem: A): Boolean = this.counts.contains(elem)
override def empty: MultiSet[A] = new MultiSet[A]
override def iterator: Iterator[A] = {
for ((elem, count) <- this.counts.iterator; _ <- 1 to count) yield elem
}
override def newBuilder: Builder[A,MultiSet[A]] = new Builder[A, MultiSet[A]] {
var multiSet = empty
def +=(elem: A): this.type = {this.multiSet += elem; this}
def clear(): Unit = this.multiSet = empty
def result(): MultiSet[A] = this.multiSet
}
override def seq: MultiSet[A] = this
}
object MultiSet {
def empty[A]: MultiSet[A] = new MultiSet[A]
def apply[A](elem: A, elems: A*): MultiSet[A] = MultiSet.empty + elem ++ elems
def apply[A](elems: Seq[A]): MultiSet[A] = MultiSet.empty ++ elems
def apply[A](elem: (A, Int), elems: (A, Int)*) = new MultiSet((elem +: elems).toMap)
def apply[A](elems: Map[A, Int]): MultiSet[A] = new MultiSet(elems)
}
Extending Set
is nice, because it means that MultiSet
automatically gets definitions for union, difference, etc. All the following will hold:
// add
assert(
MultiSet("X" -> 3, "Y" -> 1) + "X" ===
MultiSet("X" -> 4, "Y" -> 1))
assert(
MultiSet("X" -> 3, "Y" -> 1) + "Z" ===
MultiSet("X" -> 3, "Y" -> 1, "Z" -> 1))
// remove
assert(
MultiSet("a" -> 2, "b" -> 5) - "b" ===
MultiSet("a" -> 2, "b" -> 4))
assert(
MultiSet("a" -> 2, "b" -> 5) - "c" ===
MultiSet("a" -> 2, "b" -> 5))
// add all
assert(
MultiSet(10 -> 1, 100 -> 3) ++ MultiSet(10 -> 1, 1 -> 7) ===
MultiSet(100 -> 3, 10 -> 2, 1 -> 7))
// remove all
assert(
MultiSet("a" -> 2, "b" -> 5) -- MultiSet("a" -> 3) ===
MultiSet("b" -> 5))
However, I would have to override some inherited methods like union
and intersect
because they would do the wrong things for multisets, e.g. the following would not hold:
// union (takes max of values)
assert(
MultiSet(10 -> 5, 1 -> 1).union(MultiSet(10 -> 3, 1 -> 7)) ===
MultiSet(10 -> 5, 1 -> 7))
// intersection (takes min of values)
assert(
MultiSet(10 -> 5, 100 -> 3).intersect(MultiSet(10 -> 1, 1 -> 7)) ===
MultiSet(10 -> 1))
Another problem with extending Set
is that then I can't have MultiSet
be an A => Int
because I'll get the error: illegal inheritance; class MultiSet inherits different type instances of trait Function1: A => Int and A => Boolean
. I could work around this by declaring a separate, say, count
method, but I'd really prefer that the class just be an A => Int
.
Another approach would be to inherit from Map[A, Int]
, which would give me the A => Int
that I want, but then I would have to define all of my own ++
, --
, etc. since in Map
these would be defined over (A, Int)
pairs, but for a multiset they need to be defined over A
s.
I guess a third approach would be to give up on both Set
and Map
, and just implement a new subclass of Iterable
or whatever.
What would you recommend? What's the best way to fit a MulitSet
within the Scala collections framework?