I recently needed to do many-many lookups on a set of integer pairs for solving a problem in a programming contest. I program in Java. Normally I would use arrays for pairs of integers, but because sets in Java do not work with arrays, I decided to use lists this time. I later discovered that my solution was slow, but another contestant's solution using Point
instead was much faster.
I implemented a small benchmark to compare the two. I discovered that Set.contains()
on a Set<List<Integer>>
(where each List<Integer>
is of size 2) is on average about 4.5 times slower than on a Set<Point>
, regardless of whether the pair is actually contained in the set. I thought the reason could stem from different implementations of hashCode()
and additionally implemented a custom Point
class that uses List.hashCode()
. This appears to be part of the reason, but the difference with an actual List<Integer>
is still considerable.
This is my benchmark code:
import java.awt.Point;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Random;
import java.util.Set;
public class IntPairTiming {
public static final int SIZE = 1_000_000;
public static final int TRIES = 100_000_000;
public static void main(String[] args) {
Random random = new Random();
// Pairs to store
Set<Point> pointSet = new HashSet<>();
Set<List<Integer>> listSet = new HashSet<>();
Set<MyPoint> myPointSet = new HashSet<>();
for (int i = 0; i < SIZE; i++) {
Point point = new Point(random.nextInt(), random.nextInt());
pointSet.add(point);
List<Integer> list = Arrays.asList(point.x, point.y);
listSet.add(list);
myPointSet.add(new MyPoint(point.x, point.y, list));
}
// Pair to lookup
Point point = new Point(random.nextInt(), random.nextInt());
List<Integer> list = Arrays.asList(point.x, point.y);
MyPoint myPoint = new MyPoint(point.x, point.y, list);
pointSet.add(point);
listSet.add(list);
myPointSet.add(myPoint);
// Time set of points
timeLookup(pointSet, point, "Set of Points");
// Time set of lists
timeLookup(listSet, list, "Set of Lists");
// Time set of mypoints
timeLookup(myPointSet, myPoint, "Set of MyPoints");
}
private static <T> void timeLookup(Set<T> set, T obj, String what) {
System.out.println("-----------");
System.out.printf("%s:%n", what);
System.out.printf("Contains = %s%n", set.contains(obj));
long s = System.nanoTime();
for (int i = 0; i < TRIES; i++) set.contains(obj);
long e = System.nanoTime();
System.out.printf("Average lookup = %sns%n", (e - s) / TRIES);
}
private static class MyPoint extends Point {
private final List<Integer> list;
public MyPoint(int x, int y, List<Integer> list) {
super(x, y);
this.list = list;
}
@Override
public int hashCode() {
return this.list.hashCode();
}
}
}
Output:
-----------
Set of Points:
Contains = true
Average lookup = 12ns
-----------
Set of Lists:
Contains = true
Average lookup = 55ns
-----------
Set of MyPoints:
Contains = true
Average lookup = 25ns
Why is there such a noticeable difference? From my understanding, both hashCode()
and equals()
should perform more or less equivalently on a Point
and a List<Integer>
of size 2. What if we need to lookup tuples of more than two elements, would a custom class similar to Point
always be more efficient than a list?