Question

I read about how TreeSet is slower than HashSet, (that adding elements into a TreeSet is slower) so i made a performance test, i'm trying to find out if it's better to add elements to HashSet and then move them into a TreeSet or put them in there in the first place. It looks like that inserting elements into a HashSet is faster but only when i insert a large amount of elements, why? I've read that if i don't need the elements sorted, always use HashSet, but apparently, sometimes it's slower.

When I insert a concrete value("1") instead of random numbers, TreeSet is also faster because there is no sorting, so how can i know when to use HashSet or TreeSet?

And my second question, when i create TreeSet like this, why don't i have access to "NavigableSet" methods?

Set<Integer> treeSet = new TreeSet<Integer>();     //cant type treeSet.lower(e: E)
TreeSet<Integer> treeSet = new TreeSet<Integer>(); //can  type treeSet.lower(e: E)

Thanks for helping me with this.

Here are the results:

5 000 000 (random numbers)

enter image description here

5 000 000 (numbers "1")

enter image description here

500 000 (random numbers)

enter image description here

50 000 (random numbers)

enter image description here

Here is my code:

package performancetest;

import java.text.DecimalFormat;
import java.util.HashSet;
import java.util.InputMismatchException;
import java.util.Scanner;
import java.util.TreeSet;

public class HashSet_vs_TreeSet {
private static DecimalFormat df = new DecimalFormat("#.#####");
private static double hashTime, treeTime, hashToTreeTime;
private static int numbers = 0;

public static void main(String[] args){
    start();
    hashSetTest();
    treeSetTest();
    fromHashToTreeSet();
    difference();
}

/**
 * in this method, instead of "System.exit(1)" i try'ed "start()" but when i did lets say three mistakes, after correct input the 
 * methods hashSetTest();treeSetTest();fromHashToTreeSet();difference(); were running 4 times... i made this method exactly for the
 * purpose to repeat itself in case of a input mistake.
 */
public static void start(){
    System.out.print("Enter a number(a advise bigger or equal to 50 000): ");
    Scanner scan = new Scanner(System.in);
    try{
        numbers = scan.nextInt();
    }catch(InputMismatchException e){
        System.out.println("ERROR: You need to enter a number");
        System.exit(1);
    }
    System.out.println(numbers + " numbers in range from 0-99 was randomly generated into " +
            "\n1)HashSet\n2)TreeSet\n3)HashSet and then moved to TreeSet\n");
}

public static void hashSetTest(){
    /**
     * i try'ed HashSet<Integer> hashSet = new HashSet<Integer>();
     * but changing the load factor didn't make any difference, what its good for then ?
     */
    HashSet<Integer> hashSet = new HashSet<Integer>(5000,(float) 0.5);
    double start = System.currentTimeMillis() * 0.001;

    for (int i = 0; i < numbers; i++) {
        hashSet.add((int)(Math.random() * 100));
    }

    hashTime = (System.currentTimeMillis() * 0.001) - start;

    System.out.println("HashSet takes " + df.format(hashTime) + "s");
}

public static void treeSetTest(){
    TreeSet<Integer> treeSet = new TreeSet<Integer>();
    double start = System.currentTimeMillis() * 0.001;

    for (int i = 0; i < numbers; i++) {
        treeSet.add((int)(Math.random() * 100));
    }

    treeTime = (System.currentTimeMillis() * 0.001) - start;

    System.out.println("TreeSet takes " + df.format(treeTime) + "s");
}

public static void fromHashToTreeSet(){
    HashSet<Integer> hashSet = new HashSet<Integer>();
    double start = System.currentTimeMillis() * 0.001;

    for (int i = 0; i < numbers; i++) {
        hashSet.add((int)(Math.random() * 100));
    }

    TreeSet<Integer> treeSet = new TreeSet<Integer>(hashSet);
    hashToTreeTime = (System.currentTimeMillis() * 0.001) - start;

    System.out.println("TreeSet from HashSet takes " + df.format(hashToTreeTime) + "s");
}

public static void difference(){
    double differenceSec = 0;
    double differenceTimes = 0;

    if(hashTime < treeTime){
        differenceSec = (treeTime - hashTime);
        differenceTimes = (treeTime / hashTime);
        System.out.println("\nHashSet takes " + df.format(differenceSec) + "s less then TreeSet, it is " + df.format(differenceTimes) + " times faster");
    }else{
        differenceSec = (hashTime - treeTime);
        differenceTimes = (hashTime / treeTime);
        System.out.println("\nTreeSet takes " + df.format(differenceSec) + "s less then HashSet, it is " + df.format(differenceTimes) + " times faster");
    }
}
}
Was it helpful?

Solution

Well, when you talk about peformance of TreeSet and HashSet you should clearly understand how these structures are organized what consequences of its organization.

Typically TreeSet is a structure where all elements are organized in a binary tree. Thus adding a member or accessing it is ~O(log(N)).

In other hand HashSet is a structure similar to an array. The difference is that in an array index is an unique number, while in a HashSet each key needs to be translated into index with the help of a hash function. A hash function may produce the same results for different input data, the situation is called hash collision. A good hash function (yes, they could be bad and good) produces as many unique results on a given set of input data as possible.

So accessing data in a hash set costs calculations of a hash function (in Java usually this is .hashCode()) and possible conflict resolution. That is its estimated as O(1) i.e. a constant number of operations.

You should understand that O(1) is not always less than O(log(n)), it's less asymptotically and on big enough n. Also a proper choice of a hash function does matter.

OTHER TIPS

0) JVM benchmarking is really complicated. Almost always you're measuring not what you're thinking you are measuring. There's http://openjdk.java.net/projects/code-tools/jmh/ for microbenchmarking from guys from Oracle. And you may try some benchmarking frameworks and guides.

JIT compiler warmup, initial memory allocation, garbage collector and a LOT of other things may invalidate your benchmark.

1) See also Hashset vs Treeset regarding your first question

2) Set<Integer> treeSet = new TreeSet<Integer>(); //cant type treeSet.lower(e: E)

That's how it works. You declare treeSet as Set. Set does not extends NavigableSet. You may explicitly cast if you want to. But if you want to access NavigableSet methods why wouldn't you declare treeSet as NavigableSet

Set<Integer> treeSet = new TreeSet<Integer>(); 
Integer lower = ((NavigableSet) treeSet).lower(); // thus should work

Try to run this code. I took it from codeforces.ru. This is the demonstration of how HashSet/HashMap may work. It took 1.3 - 1.4 sec to add 10^5 values. According to linked topic - shuffling won't help (I didn't tried). TreeSet will never show such terrible perfomance.

import java.io.*;
import java.util.*;
import static java.lang.Math.*;

public class Main implements Runnable {

    public static void main(String[] args) {
        new Thread(null, new Main(), "", 16 * (1L << 20)).start();
    }

    public void run() {
        try {
            long t1 = System.currentTimeMillis();
            solve();
            long t2 = System.currentTimeMillis();
            System.out.println("Time = " + (t2 - t1));
        } catch (Throwable t) {
            t.printStackTrace(System.err);
            System.exit(-1);
        }
    }

    // solution

    int hashinv(int h) {
        h ^= (h >>> 4) ^ (h >>> 7) ^ (h >>> 8) ^ (h >>> 14) ^ (h >>> 15)
                ^ (h >>> 18) ^ (h >>> 19) ^ (h >>> 20) ^ (h >>> 21)
                ^ (h >>> 23) ^ (h >>> 26) ^ (h >>> 28);
        return h;
    }

    int bitreverse(int h) {
        int res = 0;
        for (int i = 0; i < 31; i++)
            if ((h & (1 << i)) != 0)
                res |= (1 << (30 - i));
        return res;
    }

    void solve() throws IOException {
        final int size = 100000;
        int[] s = new int[size];
        for (int i = 0, val = 0; i < size; i++) {
            s[i] = Integer.MAX_VALUE;
            while (s[i] > 1000000000)
                s[i] = hashinv(bitreverse(val++));
        }
        long startTime = System.currentTimeMillis();
        HashSet<Integer> h = new HashSet<Integer>(size);
        for (int i = 0; i < size; i++)
                h.add(s[i]);
        System.out.println("HashSet adding time = " + (System.currentTimeMillis() - startTime));
    }

}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top