HashSet and TreeSet performance test

Question 1

Well, when you talk about peformance of TreeSet and HashSet you should clearly understand how these structures are organized what consequences of its organization.

Typically TreeSet is a structure where all elements are organized in a binary tree. Thus adding a member or accessing it is ~O(log(N)).

In other hand HashSet is a structure similar to an array. The difference is that in an array index is an unique number, while in a HashSet each key needs to be translated into index with the help of a hash function. A hash function may produce the same results for different input data, the situation is called hash collision. A good hash function (yes, they could be bad and good) produces as many unique results on a given set of input data as possible.

So accessing data in a hash set costs calculations of a hash function (in Java usually this is .hashCode()) and possible conflict resolution. That is its estimated as O(1) i.e. a constant number of operations.

You should understand that O(1) is not always less than O(log(n)), it's less asymptotically and on big enough n. Also a proper choice of a hash function does matter.

Question 2

0) JVM benchmarking is really complicated. Almost always you're measuring not what you're thinking you are measuring. There's http://openjdk.java.net/projects/code-tools/jmh/ for microbenchmarking from guys from Oracle. And you may try some benchmarking frameworks and guides.

JIT compiler warmup, initial memory allocation, garbage collector and a LOT of other things may invalidate your benchmark.

1) See also Hashset vs Treeset regarding your first question

2) Set<Integer> treeSet = new TreeSet<Integer>(); //cant type treeSet.lower(e: E)

That's how it works. You declare treeSet as Set. Set does not extends NavigableSet. You may explicitly cast if you want to. But if you want to access NavigableSet methods why wouldn't you declare treeSet as NavigableSet

Set<Integer> treeSet = new TreeSet<Integer>(); 
Integer lower = ((NavigableSet) treeSet).lower(); // thus should work

Question 3

Try to run this code. I took it from codeforces.ru. This is the demonstration of how HashSet/HashMap may work. It took 1.3 - 1.4 sec to add 10^5 values. According to linked topic - shuffling won't help (I didn't tried). TreeSet will never show such terrible perfomance.

import java.io.*;
import java.util.*;
import static java.lang.Math.*;

public class Main implements Runnable {

    public static void main(String[] args) {
        new Thread(null, new Main(), "", 16 * (1L << 20)).start();
    }

    public void run() {
        try {
            long t1 = System.currentTimeMillis();
            solve();
            long t2 = System.currentTimeMillis();
            System.out.println("Time = " + (t2 - t1));
        } catch (Throwable t) {
            t.printStackTrace(System.err);
            System.exit(-1);
        }
    }

    // solution

    int hashinv(int h) {
        h ^= (h >>> 4) ^ (h >>> 7) ^ (h >>> 8) ^ (h >>> 14) ^ (h >>> 15)
                ^ (h >>> 18) ^ (h >>> 19) ^ (h >>> 20) ^ (h >>> 21)
                ^ (h >>> 23) ^ (h >>> 26) ^ (h >>> 28);
        return h;
    }

    int bitreverse(int h) {
        int res = 0;
        for (int i = 0; i < 31; i++)
            if ((h & (1 << i)) != 0)
                res |= (1 << (30 - i));
        return res;
    }

    void solve() throws IOException {
        final int size = 100000;
        int[] s = new int[size];
        for (int i = 0, val = 0; i < size; i++) {
            s[i] = Integer.MAX_VALUE;
            while (s[i] > 1000000000)
                s[i] = hashinv(bitreverse(val++));
        }
        long startTime = System.currentTimeMillis();
        HashSet<Integer> h = new HashSet<Integer>(size);
        for (int i = 0; i < size; i++)
                h.add(s[i]);
        System.out.println("HashSet adding time = " + (System.currentTimeMillis() - startTime));
    }

}