Frage

I heard that hashing (ie converting a string or object to a number) is used for strings and such because it is easier to compare numbers than strings. If true, what is the reason for this ?

War es hilfreich?

Lösung

This is not necessarily the case, but probably the case most of the time.

Consider the following situation:

I want to compare the string "apples" vs. "oranges". If I only want to determine "apples" == "oranges", I need only compare the first character of each string: 'a' != 'o' => "apples" != "oranges". If I hash the string and then do the comparison, it is significantly slower as I have to parse both strings and feed them into a hashing algorithm before comparing the resultant integers.

If, however, I need to do this comparison many times, and perhaps I'm comparing "oranges" to "orangutans" a lot, then if I hash all the strings once and do the comparisons of integers many times, it will work out faster. This is the principle that a hash map is based on.

Note, however, that hashing a string is useful for direct equals comparisons, it cannot determine if strings are lexigraphically greater or less than each other, and so ordering strings is not possible via the hashing method. (This is why HashMap in Java is unordered).

Andere Tipps

Comparing two numbers is magnitudes faster than comparing two strings (representing the same numbers). Comparing two numbers simply require comparing individual bits and could be done super fast using any of AND, XOR, 2's complements, etc.

Comparing two strings is very slow and expensive. Most algorithms require iterating through entire string and matching each character.

For example let's say we want to compare 9 with 12 (false). For numeric comparison, let's assume the algorithm compares individual bit. 9 = 1001 12 = 1100

Here, the worst case algorithm will compare 4 bits.

Now if we represent "9" and "12" as strings, they will be stored in memory as 16 bits each (Recall: Java uses UTF-16 to represent strings in memory) and have to be passed to a String comparison algorithm. In fact, Java's actual String comparison function is below:

public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String)anObject;
        int n = count;
        if (n == anotherString.count) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = offset;
            int j = anotherString.offset;
            while (n-- != 0) {
                if (v1[i++] != v2[j++])
                    return false;
            }
            return true;
        }
    }
    return false;
}

As you can see, there's a lot more going around for String comparison.

Comparing primitive numbers is definitely faster than comparing strings because it's just one computer instruction while comparing strings in Java is a method. But hashing in Java is used for a different reason, Object.hashCode() is used in hash tables for quick search in collections.

In general, most Computers have a single instruction to compare integers, longs etc and will take at most a couple of instruction cycles. Strings are normally compared by a utility function/method (there may be the odd exception to this rule).

in Java for example a String is basically represented as

     /** The value is used for character storage. */
     private final char value[];

     /** The offset is the first index of the storage that is used. */
     private final int offset;

     /** The count is the number of characters in the String. */
     private final int count;

And the equals method is

if (this == anObject) {
    return true;
}
if (anObject instanceof String) {
    String anotherString = (String)anObject;
    int n = count;
    if (n == anotherString.count) {
        char v1[] = value;
        char v2[] = anotherString.value;
        int i = offset;
        int j = anotherString.offset;
        while (n-- != 0) {
            if (v1[i++] != v2[j++])
                return false;
        }
        return true;
    }
}
return false;

The equals Method does both this == anObject and n == anotherString.count, both essentially integer compares, even before it starts comparing characters. It is going take a lot longer than a single instruction that a integer compare takes


The C string compare is simpler / faster than the Java equivalent but it will contain some sort of loop and multiple instructions for each pass through the loop.

This will take longer than the one instruction that a Integer Compare requires

Yes, but that has nothing to do with hashing.

Comparing numbers involves simple hardware instructions that compare bits.

Comparing strings involves (a) iterating through both strings until you find differing characters (unlike numbers, which are fixed-size) and (b) lots of Unicode magic (different strings of different lengths can actually be equal, and different characters in different code blocks compare differently).


Hashing is typically used to convert a string into an array index.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top