Is String.hashCode() portable across VMs, JDKs and OSs?

https://stackoverflow.com/questions/190376

06-07-2019
|

Question

An interesting issue came up recently. We came across some code that is using hashCode() as a salt source for MD5 encryption but this raises the question: will hashCode() return the same value for the same object on different VMs, different JDK versions and operating systems? Even if its not guaranteed, has it changed at any point up til now?

EDIT: I really mean String.hashCode() rather than the more general Object.hashCode(), which of course can be overridden.

Solution

No. From http://tecfa.unige.ch/guides/java/langspec-1.0/javalang.doc1.html:

The general contract of hashCode is as follows:

Whenever it is invoked on the same object more than once during an execution of a Java application, hashCode must consistently return the same integer. The integer may be positive, negative, or zero. This integer does not, however, have to remain consistent from one Java application to another, or from one execution of an application to another execution of the same application. [...]

OTHER TIPS

It depends on the type:

If you've got a type which hasn't overridden hashCode() then it will probably return a different hashCode() each time you run the program.
If you've got a type which overrides hashCode() but doesn't document how it's calculated, it's perfectly legitimate for an object with the same data to return a different hash on each run, so long as it returns the same hash for repeated calls within the same run.
If you've got a type which overrides hashCode() in a documented manner, i.e. the algorithm is part of the documented behaviour, then you're probably safe. (java.lang.String documents this, for example.) However, I'd still steer clear of relying on this on general principle, personally.

Just a cautionary tale from the .NET world: I've seen at least a few people in a world of pain through using the result of string.GetHashCode() as their password hash in a database. The algorithm changed between .NET 1.1 and 2.0, and suddenly all the hashes are "wrong". (Jeffrey Richter documents an almost identical case in CLR via C#.) When a hash does need to be stored, I'd prefer it to be calculated in a way which is always guaranteed to be stable - e.g. MD5 or a custom interface implemented by your types with a guarantee of stability.

According to the docs: the hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

I am not certain whether this is a formal specification or just Sun's implementation. At the very least, it should be the same on all existing Sun VMs, regardless of platform or operating system.

No. Hash algorithms are not guaranteed, unless otherwise specified. So for instance, deserialisation of hash structures need to recalculate hash codes, and these values should not be stored in the serialised form.

I would like to add that you can override hashCode() (don't forget equals() if you do that) to make sure your business objects return the same hashCode everywhere. Those objects will then at least have a predictable hashCode.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow