Quick and Simple Hash Code Combinations

https://stackoverflow.com/questions/1646807

22-07-2019
|

Question

Can people recommend quick and simple ways to combine the hash codes of two objects. I am not too worried about collisions since I have a Hash Table which will handle that efficiently I just want something that generates a code quickly as possible.

Reading around SO and the web there seem to be a few main candidates:

XORing
XORing with Prime Multiplication
Simple numeric operations like multiplication/division (with overflow checking or wrapping around)
Building a String and then using the String classes Hash Code method

What would people recommend and why?

Solution

I would personally avoid XOR - it means that any two equal values will result in 0 - so hash(1, 1) == hash(2, 2) == hash(3, 3) etc. Also hash(5, 0) == hash(0, 5) etc which may come up occasionally. I have deliberately used it for set hashing - if you want to hash a sequence of items and you don't care about the ordering, it's nice.

I usually use:

unchecked
{
    int hash = 17;
    hash = hash * 31 + firstField.GetHashCode();
    hash = hash * 31 + secondField.GetHashCode();
    return hash;
}

That's the form that Josh Bloch suggests in Effective Java. Last time I answered a similar question I managed to find an article where this was discussed in detail - IIRC, no-one really knows why it works well, but it does. It's also easy to remember, easy to implement, and easy to extend to any number of fields.

OTHER TIPS

While the template outlined in Jon Skeet's answer works well in general as a hash function family, the choice of the constants is important and the seed of 17 and factor of 31 as noted in the answer do not work well at all for common use cases. In most use cases, the hashed values are much closer to zero than int.MaxValue, and the number of items being jointly hashed are a few dozen or less.

For hashing an integer tuple {x, y} where -1000 <= x <= 1000 and -1000 <= y <= 1000, it has an abysmal collision rate of almost 98.5%. For example, {1, 0} -> {0, 31}, {1, 1} -> {0, 32}, etc. If we expand the coverage to also include n-tuples where 3 <= n <= 25, it does less terrible with a collision rate of about 38%. But we can do much better.

public static int CustomHash(int seed, int factor, params int[] vals)
{
    int hash = seed;
    foreach (int i in vals)
    {
        hash = (hash * factor) + i;
    }
    return hash;
}

I wrote a Monte Carlo sampling search loop that tested the method above with various values for seed and factor over various random n-tuples of random integers i. Allowed ranges were 2 <= n <= 25 (where n was random but biased toward the lower end of the range) and -1000 <= i <= 1000. At least 12 million unique collision tests were performed for each seed and factor pair.

After about 7 hours running, the best pair found (where the seed and factor were both limited to 4 digits or less) was: seed = 1009, factor = 9176, with a collision rate of 0.1131%. In the 5- and 6-digit areas, even better options exist. But I selected the top 4-digit performer for brevity, and it peforms quite well in all common int and char hashing scenarios. It also seems to work fine with integers of much greater magnitudes.

It is worth noting that "being prime" did not seem to be a general prerequisite for good performance as a seed and/or factor although it likely helps. 1009 noted above is in fact prime, but 9176 is not. I explicitly tested variations on this where I changed factor to various primes near 9176 (while leaving seed = 1009) and they all performed worse than the above solution.

Lastly, I also compared against the generic ReSharper recommendation function family of hash = (hash * factor) ^ i; and the original CustomHash() as noted above seriously outperforms it. The ReSharper XOR style seems to have collision rates in the 20-30% range for common use case assumptions and should not be used in my opinion.

If you are using .NET Core 2.1, consider using the System.HashCode struct to help with producing composite hash codes. It has two modes of operation: Add and Combine.

An example using Combine, which is usually simpler and works for up to eight items:

public override int GetHashCode()
{
    return HashCode.Combine(object1, object2);
}

An example of using Add:

public override int GetHashCode()
{
    var hash = new HashCode();
    hash.Add(this.object1);
    hash.Add(this.object2);
    return hash.ToHashCode();
}

Pros:

Part of .NET itself (though, see con below)
Looks to have good performance and mixing characteristics, based on the work the author and the reviewers did before merging this into the corefx repo
Handles nulls automatically
Overloads that take IEqualityComparer instances

Cons:

As of August 2018, only available when targeting .NET Core 2.1 or later.
- As of April 2019, part of .NET Standard 2.1 Preview. I do not know when .NET Standard 2.1 Preview will be released, and I do not know for sure whether HashCode will be part of that.
General purpose, so it won't handle super-specific cases as well as hand-crafted code

I presume that .NET Framework team did a decent job in testing their System.String.GetHashCode() implementation, so I would use it:

// System.String.GetHashCode(): http://referencesource.microsoft.com/#mscorlib/system/string.cs,0a17bbac4851d0d4
// System.Web.Util.StringUtil.GetStringHashCode(System.String): http://referencesource.microsoft.com/#System.Web/Util/StringUtil.cs,c97063570b4e791a
public static int CombineHashCodes(IEnumerable<int> hashCodes)
{
    int hash1 = (5381 << 16) + 5381;
    int hash2 = hash1;

    int i = 0;
    foreach (var hashCode in hashCodes)
    {
        if (i % 2 == 0)
            hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ hashCode;
        else
            hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ hashCode;

        ++i;
    }

    return hash1 + (hash2 * 1566083941);
}

Another implementation is from System.Web.Util.HashCodeCombiner.CombineHashCodes(System.Int32, System.Int32) and System.Array.CombineHashCodes(System.Int32, System.Int32) methods. This one is simpler, but probably doesn't have such a good distribution as the method above:

// System.Web.Util.HashCodeCombiner.CombineHashCodes(System.Int32, System.Int32): http://referencesource.microsoft.com/#System.Web/Util/HashCodeCombiner.cs,21fb74ad8bb43f6b
// System.Array.CombineHashCodes(System.Int32, System.Int32): http://referencesource.microsoft.com/#mscorlib/system/array.cs,87d117c8cc772cca
public static int CombineHashCodes(IEnumerable<int> hashCodes)
{
    int hash = 5381;

    foreach (var hashCode in hashCodes)
        hash = ((hash << 5) + hash) ^ hashCode;

    return hash;
}

Use the combination logic in tuple. The example is using c#7 tuples.

(field1, field2).GetHashCode();

If your input hashes are the same size, evenly distributed and not related to each other then an XOR should be OK. Plus it's fast.

The situation I'm suggesting this for is where you want to do

H = hash(A) ^ hash(B); // A and B are different types, so there's no way A == B.

of course, if A and B can be expected to hash to the same value with a reasonable (non-negligible) probability, then you should not use XOR in this way.

If you're looking for speed and don't have too many collisions, then XOR is fastest. To prevent a clustering around zero, you could do something like this:

finalHash = hash1 ^ hash2;
return finalHash != 0 ? finalHash : hash1;

Of course, some prototyping ought to give you an idea of performance and clustering.

Assuming you have a relevant toString() function (where your different fields shall appear), I would just return its hashcode:

this.toString().hashCode();

This is not very fast, but it should avoid collisions quite well.

I would recommend using the built-in hash functions in System.Security.Cryptography rather than rolling your own.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow