Testing string equality using hashCode()

https://stackoverflow.com/questions/1465621

13-09-2019
|

Question

Is there any reason why a Java string cannot be tested for equality using it's hashCode method? So basically, instead of....

"hello".equals("hello")

You could use...

"hello".hashCode() == "hello".hashCode()

This would be useful because once a string has calculated it's hashcode then comparing a string would be as efficient as comparing an int as the string caches the hashcode and it is quite likely that the string is in the string pool anyway, if you designed it that way.

Solution

because: hashCodes of two objects must be equal if the objects are equal, however, if two objects are unequal, the hashCode can still be equal.

(modified after comment)

OTHER TIPS

Let me give you a counter example. Try this,

public static void main(String[] args) {
    String str1 = "0-42L";
    String str2 = "0-43-";

    System.out.println("String equality: " + str1.equals(str2));
    System.out.println("HashCode eqauality: " + (str1.hashCode() == str2.hashCode()));
}

The result on my Java,

String equality: false
HashCode eqauality: true

as many said hashCode does not guaranty uniqueness. in fact, it cannot do that for a very simple reason.

hashCode returns an int, which means there are 2^32 possible values (around 4,000,000,000), but there are surely more than 2^32 possible strings, which means at least two strings have the same hashcode value.

this is called Pigeonhole principle.

Others have pointed out why it won't work. So I'll just add the addendum that the gain would be minimal anyway.

When you compare two strings in Java, the String equals function first checks if they are two references to the same object. If so, it immediately returns true. Then it checks if the lengths are equal. If not, it returns false. Only then does it start comparing character-by-character.

If you're manipulating data in memory, the same-object compare may quickly handle the "same" case, and that's a quick, umm, 4-byte integer compare I think. (Someone correct me if I have the length of an object handle wrong.)

For most unequal strings, I'd bet the length compare quickly finds them not equal. If you're comparing two names of things -- customers, cities, products, whatever -- they'll usually have unequal length. So a simple int compare quickly disposes of them.

The worst case for performance is going to be two long, identical, but not the same object strings. Then it has to do the object handle compare, false, keep checking. The length compare, true, keep checking. Then character by character through the entire length of the string to verify that yes indeed they are equal all the way to the end.

You can get the effect you want using String.intern() (which is implemented using a hash table.)

You can compare the return values of intern() using the == operator. If they refer to the same string then the original strings were equivalent (i.e. equals() would have returned true), and it requires only a pointer comparison (which has the same cost as an int comparison.)

String a = "Hello";
String b = "Hel" + "lo";

System.out.println(a.equals(b));
System.out.println(a == b);

String a2 = a.intern();
String b2 = b.intern();

System.out.println(a2.equals(b2));
System.out.println(a2 == b2);

Output:

true
false
true
true

The hashCode value isn't unique, which means the Strings may not actually match. To improve performance, often implementations of equals will perform a hashCode check before performing more laborious checks.

Very simple reason: risk of collisions... A hash code will have a lot less possible values than a string. It depends a bit of the kind of hash you generate but let's take a very simple example, where you would add the ordinal values of letters, multiplied with it's position: a=1, b=2, etc. Thus, 'hello' would translate to: h: 8x1=8, e: 5x2=10, l: 12x3=36, l: 12x4=48, o: 15x5=75. 8+10+36+48+75=177.

Are there other string values that could end as 177 hashed? Of course! Plenty of options. Feel free to calculate a few.

Still, this hashing method used a simple method. Java and .NET use a more complex hashing algorithm with a lot smaller chance of such collisions. But still, there's a chance that two different strings will result in the same hash value, thus this method is less reliable.

Two different String can easily generate same hash Code or different hash Code. If u want a equality test hash Code won't give an unique result. When we use String class it will return different value of hash Code. So String buffer class should be apply to have same hash Code for every concated object.

There is no reason not to use hashCode as you describe.

However, you must be aware of collisions. There is a chance - a small chance admittedly - that two different strings do hash to the same value. Consider doing a hashCode at first, and if equal also do the full comparison using the equals().

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow