Is it possible to write a hash code function for an comparer that matches many-to-many?

https://stackoverflow.com/questions/10885675

12-06-2021
|

Question

Can I write a hash code function for the following comparer logic?

Two instances of My are equal if at least two properites from (A, B, C) match.

The Equals part is simple, but I'm stumped on the hash code part, and part of me is thinking it might not be possible.

class MyOtherComparer : IEqualityComparer<My>
{
    public bool Equals(My x, My y)
    {
        if (Object.ReferenceEquals(x, y)) 
            return true;      

        if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null)) 
            return false;

        int matches = 0;

        if(x.A == y.A) matches++;

        if(x.B == y.B) matches++;

        if(x.C == y.C) matches++;

        // match on two out of three
        return  (matches > 1)
    }

    // If Equals() returns true for a pair of objects 
    // then GetHashCode() must return the same value for these objects.
    public int GetHashCode(My x)
    {
       // ???
    }
}

UPDATE: In addition to the correct answer by Reed Copsey, a very important point about the general usefulness of a fuzzy comparer is clearly stated by Ethan Brown - please see his answer as well for a full understanding of what underlies this Question/Answer.

Solution

Yes, it's possible. The simplest implementation would be to always return a constant.

public int GetHashCode(My x) 
{ 
   return 0;
}

The GetHashCode documentation states:

Implementations are required to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y.

However, you are perfectly free to return the same hash code for two objects which are not equal, as well.

That being said, this will potentially cause some algorithms to perform very poorly, as you'll get lots of hash collisions. However, given the nature of your odd/unique equality check, this may be required.

Note that this is going to be problematic in any case. Given your logic, it's possible to have three objects, where comparer.Equals(foo, bar)==true and comparer.Equals(foo, baz)==true but comparer.Equals(baz, bar)==false. This is likely to be problematic in many cases where IEqualityComparer<T> is used.

OTHER TIPS

A hash code has to be the same for two equal objects, but it does not have to be different for two different objects. You could return the same value for all objects to satisfy the IEqualityComparer consumers, but there's no way I know of to get any speed benefits from the hash in your situation.

Suppose that we have 2 objects A,B. Each of them have properties p1,p2 and p3. Suppose A.p1 == B.p1 and A.p3 == B.p3 , if the hash function depends on p2 , it'll be different for A and B so they are not equal. If you want to calculate the hash function based on the p1 and p3, there are many examples which hash function wouldn't return the right hash value and many equal objects will be not equal. I think we can't have a variable function. You can use a constant one but if you want to use it as a hash key in a dictionary or hash table you will not gain near O(1) complexity.

Can I write a hash code function for the following comparer logic?

Yes. You can always write a hash code for anything. The question is how efficient it will be. No matter what, you can always have:

public int GetHashCode()
{
  return 0;
}

It will always work, but it's horribly * inefficient*.

The core problem with arriving at a non-constant hash function is that you can't ensure transitivity across equality. Normally, equality is considered transitive. That is, A=B and B=C implies that A=C (which further implies that A, B, and C would all have the same hash code). However, with your definition of equality, you could have A=B, B=C and A!=C. Ideally, unequal elements would have different hash codes, so A and C would have different hash codes; but they can't because they both equal B, so they have to all have the same hash code.

The only way you could arrive at a non-constant hash function is if you knew something about your total collection. You would have to partition the collection into "equality bins" where each element in a bin is equal to some other element in the bin (including the possibility of a bin of one). Once you have done this partitioning, then you can use that to generate a non-constant algorithm (assuming you get more than one bin) for generating a hash code.

The thing about the idea of equality bins is that there could be many such configurations of bins. As a selection criteria, you may want to maximize the number of bins (to improve hashtable lookup performance). The degenerative case (as indicated in Reed Copsey's correct answer) is that you put everything in the same bin (though, as supercat points out in the comments below, the name "equality bins" then becomes misleading). That doesn't violate any of the constraints of hash values, but it will lead to poor performance in algorithms that expect has values to produce a non-degenerative partitioning.

As supercat points out below, to satisfy the constraints of hash values, the following must be true: if two elements are in two different bins, they must not be equal (however, two elements in the same bin do not have to be equal).

Seeing that your real problem was dealing with the Except extension method, I decided to suggest something for you, although not really an answer.

public class EqualityComparer<T> : IEqualityComparer<T>
{
    private readonly Func<T, T, bool> _comparer;
    private readonly Func<T, int> _hashCoder;

    public EqualityComparer(Func<T, T, bool> comparer, Func<T, int> hashCoder = null)
    {
        if (comparer == null)
        {
            throw new ArgumentNullException("comparer");
        }

        this._comparer = comparer;
        this._hashCoder = hashCoder ?? (x => 0);
    }

    public bool Equals(T x, T y)
    {
        return this._comparer(x, y);
    }

    public int GetHashCode(T obj)
    {
        return this._hashCoder(obj);
    }
}

And then you can use it like so:

arr1.Except(arr2, new EqualityComparer<dynamic>((x, y) =>
     {
         if (ReferenceEquals(x, y))
             return true;

         if (ReferenceEquals(x, null) ||
             ReferenceEquals(y, null))
             return false;

         var matches = 0;

         if (x.A == y.A) matches++;
         if (x.B == y.B) matches++;
         if (x.C == y.C) matches++;

         return (matches > 1);
     }));

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow