Why Java HashMap get(key) works faster when keys are read using same HashMap's Iterator than when keys are read using a Set's Iterator?

StackOverflow https://stackoverflow.com/questions/20631998

  •  18-09-2022
  •  | 
  •  

Question

For HashMap<Integer,Integer>, after inserting it with 10000000 unique random values. I perform get(), using the hashmap's keySet(), like in the following code snippet:

HashMap<Integer, Integer> hashmap = 
                        new HashMap<Integer, Integer>(10000000, 0.99f);

// ... Code to put unique 10000000 associations into the hashmap ...

int iteration = 100;
long startTime, totalTime = 0;

while(iteration > 0) {
    for(Integer key: hashmap.keySet()) {
       startTime = System.currentTimeMillis();
       hashmap.get(key);
       totalTime += (System.currentTimeMillis() - startTime);
    }
    iteration--;
}
System.out.println(totalTime/100 + " ms");

Running the above code, I get: 225 ms

Now, if I change the above code to use a set instead, like in following snippet:

Set<Integer> set = new HashSet<Integer>(hashmap.keySet());
while(iteration > 0) {
    for(Integer key: set) {
       startTime = System.currentTimeMillis();
       hashmap.get(key);
       totalTime += (System.currentTimeMillis() - startTime);
    }
    iteration--;
}
System.out.println(totalTime/100 + " ms");

And after running this code, I get: 414 ms

Why such difference in performance?

P.S.: I used following JVM arguments:

-Xms2048m -Xmx4096m -XX:MaxPermSize=256m
Was it helpful?

Solution

When you read a large data structure (larger than 32 KB) how you read that data structure impact performance.

These are the typical sizes and speeds of you caches.

L1:   32 KB, 4 clock cycles.
L2:  256 KB, 11 clock cycles.
L3: 3-30 MB, 40-75 clock cycles.
Main memory: up to 2TB, 200-500 clock cycles.

This means cache locality is very important. i.e. if you are reading something from L1 it can be 20x faster than reading from L3.

In your case you are using a Hash data structure. This is designed for random access and random arrangement which unfortunately mean it has very poor cacheability. Access memory randomly and it is likely to be in the slower areas of memory.

However, there is an exception to this. If you access the same data multiple times, e.g. get a key you just got from an iterator, or you are scanning through a collection in order e.g. this is what the iterator does for HashMap (not so much a TreeMap) it is much more likely that the next piece of data you will access is on the same cache line (each cache line is 64-bytes long) or the next one. These type of accesses perform much better as CPU are designed to perform vector operations very fast.

BTW Your working set is the set of keys, if your values are different objects I would expect this to be much slower if you actually look at those objects (as this increases the size of your working set and how much memory would be required to cache it)

OTHER TIPS

Millisecond accuracy is not enough to measure a single get(). Read the time at the start of the loop, and at the end of the loop - don't try and increment it in sections inside as doing so will cause massive potential accuracy errors drowning out any real result.

Make sure you run through your loop 50 times without doing the timing (to warm up the JVM, ensure everything is compiled etc.) and then run it again timing the whole loop process:

Set<Integer> set = new HashSet<Integer>(hashmap.keySet());
startTime = System.currentTimeMillis();
while(iteration > 0) {
    for(Integer key: set) {
       hashmap.get(key);
    }
    iteration--;
}
totalTime = (System.currentTimeMillis() - startTime);
System.out.println(totalTime + " ms");

How did your code not get a divide by 0 error when you divided by iteration?

This

   startTime = System.currentTimeMillis();
   hashmap.get(key);
   totalTime += (System.currentTimeMillis() - startTime);

is a ridiculous attempt at microbenchmarking. It uses currentTimeMillis() whose precision is 1 ms, and actual accuracy above 10 ms, to measure a nanosecond operation. Even nanoTime alone would not help you because its accuracy is typically in the microseconds.

Also, the code performs no warmup whatsoever.

You should better use a proper microbenchmarking tool if you want to measure something as elusive as the performance of a single map#get call.

The logic to find out the performance of theses two classes is not correct.

Measure the time taken (preferably at nano sec accuracy) for the complete iteration over the keyset instead of measuring it for every call to get method. To prove your facts, the results should be consistent.Only this can prove your fact.

Also, performance heavily depends on JVM and GC configuration.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top