Question

I am working in a java-based system where I need to set an id for certain elements in the visual display. One category of elements is Strings, so I decided to use the String.hashCode() method to get a unique identifier for these elements.

The problem I ran into, however, is that the system I am working in borks if the id is negative and String.hashCode often returns negative values. One quick solution is to just use Math.abs() around the hashcode call to guarantee a positive result. What I was wondering about this approach is what are the chances of two distinct elements having the same hashcode?

For example, if one string returns a hashcode of -10 and another string returns a hashcode of 10 an error would occur. In my system we're talking about collections of objects that aren't more than 30 elements large typically so I don't think this would really be an issue, but I am curious as to what the math says.

Was it helpful?

Solution

Hash codes can be thought of as pseudo-random numbers. Statistically, with a positive int hash code the chance of a collision between any two elements reaches 50% when the population size is about 54K (and 77K for any int). See Birthday Problem Probability Table for collision probabilities of various hash code sizes.

Also, your idea to use Math.abs() alone is flawed: It does not always return a positive number! In 2's compliment arithmetic, the absolute value of Integer.MIN_VALUE is itself! Famously, the hash code of "polygenelubricants" is this value.

OTHER TIPS

Hashes are not unique, hence they are not apropriate for uniqueId.

As to probability of hash collision, you could read about birthday paradox. Actually (from what I recall) when drawing from an uniform distribution of N values, you should expect collision after drawing $\sqrt(N)$ (you could get collision much earlier). The problem is that Java's implementation of hashCode (and especially when hashing short strings) doesnt provide uniform distribution, so you'll get collision much earlier.

You already can get two strings with the same hashcode. This should be obvious if you think that you have an infinite number of strings and only 2^32 possible hashcodes.

You just make it a little more probable when taking the absolute value. The risk is small but if you need an unique id, this isn't the right approach.

What you can do when you only have 30-50 values as you said is register each String you get into an HashMap together with a running counter as value:

HashMap StringMap = new HashMap<String,Integer>();

StringMap.add("Test",1);
StringMap.add("AnotherTest",2);

You can then get your unique ID by calling this:

StringMap.get("Test"); //returns 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top