The problem was that I was using mapValues
on an existing Map to generate the new Map. I thought mapValues
works similarly to map
but in reality mapValues
only creates a View on the existing Map instead of a new Map.
Why does a lookup in Map in this particular code take such a surprisingly long time?
-
19-09-2022 - |
Question
I have the following data structures:
val set: scala.collection.immutable.Set[String] = ...
val test1: scala.collection.immutable.Map[String,scala.collection.immutable.Set[String]] = ...
val test2: Array[scala.collection.immutable.Set[String]] = ...
set
contains about 60,000 entires. test1
has two entries ("one" and "two") and each is a Set of Strings similar to set
. test2
is similar to test1
but the keys are 0 and 1 instead.
Running test1.get("one").get.contains("somestring")
takes a long time (about 1 second) but running test2(0).contains("somestring")
is very fast.
I don't quite understand why there is such a big difference. Any ideas?
Solution
OTHER TIPS
This:
test2(0)
Runs really fast because it's an array, it knows exactly where 0
is, the map has to find the "one" key first and then get's it's pair.
Running this code to generate some collections like you mentioned:
import scala.collection.mutable.{HashSet, HashMap}
import scala.util.Random
def genSet(count: Int = 100 * 1000, stringSize: Int = 10): Set[String] = {
val set = new HashSet[String]
set.sizeHint(count)
for(i <- 1 to count) {
set.add(i.toString)
}
set.toSet
}
def genSetMap(count: Int = 2, keySize: Int = 10)
(f: => Set[String]): Map[String, Set[String]] = {
val map = new HashMap[String, Set[String]]
map.sizeHint(count)
for(i <- 1 to count) {
map.put(i.toString, genSet())
}
map.toMap
}
The following test, using 100.000 elements per set, still runs instantly:
val map = genSetMap(2, 10){ genSet(100*1000) }
map("2").contains("99999") // res2: Boolean = true
So I suspect there is some peculiarity in your actual code that is causing it not to generate a set, but some other intermediate collection that doesn't have fast searching. Can you provide a more concrete example of the actual code you are seeing the behavior with?