Question

I am trying to test out how String hashcode works by adding a new character and do the reverse to subtract it out, however it didn't give me the right answer when i did the calculation.

For example

int PRIME = 31;

//this will print 1687846330 
System.out.println("mlkjihgfedcb".hashCode());   

//this will print 783628775 which equals to "mlkjihgfedcba".hashCode();
System.out.println(("mlkjihgfedcb".hashCode() * PRIME + (int) 'a')); 

//this will print 25278344 which doesn't equals to "mlkjihgfedcb".hashCode()
System.out.println(("mlkjihgfedcba".hashCode() - (int) 'a') / PRIME); 

I am wondering did i do the math right on my last step above ? overflow shouldn't matter for the calculation right ?

Was it helpful?

Solution

Hash functions are not reversible functions. As proof, consider the Pigeonhole principle, you can only store values in the integer range in a hashCode but there are an infinite number of Strings. Therefore, there must be multiple strings with the same hashCode (and there are).

OTHER TIPS

Multiplying a big number by 31 leads to integer overflow, which cannot be reversed using division by 31.

But hold on though: arithmetic using int in Java is equivalent to arithmetic modulo 232. Since 31 is odd it has a multiplicative inverse modulo 232. Therefore multiplication by 31 can be reversed with multiplication by the said "inverse." That is:

int inverse = -1108378657;
// This will print the hashCode of mlkjihgfedcb
System.out.println(("mlkjihgfedcba".hashCode() - (int) 'a') * inverse); 

The specific reason this fails is

1687846330 * 31 + 97 = 52323236327

The number 52,323,236,327 is much larger than what an int can hold, so information is lost off the left-hand end. That information cannot be recovered by inverting the hashCode() algorithm.

If you do the arithmetic you'll see that the hash code you got for the second string is exactly 52323236327 mod 231

More important, the mapping from objects to hash codes is intended to be opaque and non-reversible. Just because it's implemented in a specific way today does not mean the next version of the library has to do it the same way. The only guarantee is that the possibility of collision is extremely low (but non-zero).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top