Question

I have an immutable Value Object, IPathwayModule, whose value is defined by:

  • (int) Block;
  • (Entity) Module, identified by (string) ModuleId;
  • (enum) Status; and
  • (entity) Class, identified by (string) ClassId - which may be null.

Here's my current IEqualityComparer implementation which seems to work in a few unit tests. However, I don't think I understand what I'm doing well enough to know whether I am doing it right. A previous implementation would sometimes fail on repeated test runs.

private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
    public bool Equals(IPathwayModule x, IPathwayModule y)
    {
        int hx = GetHashCode(x);
        int hy = GetHashCode(y);
        return hx == hy;
    }

    public int GetHashCode(IPathwayModule obj)
    {
        int h;
        if (obj.Class != null)
        {
            h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + obj.Class.ClassId.GetHashCode();
        }
        else
        {
            h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + "NOCLASS".GetHashCode();
        }
        return h;
    }
}

IPathwayModule is definitely immutable and different instances with the same values should be equal and produce the same HashCode since they are used as items within HashSets.

I suppose my questions are:

  • Am I using the interface correctly in this case?
  • Are there cases where I might not see the desired behaviour?
  • Is there any way to improve the robustness, performance?
  • Are there any good practices that I am not following?
Was it helpful?

Solution 7

Thanks to all who responded. I have aggregated the feedback from everyone who responded and my improved IEqualityComparer now looks like:

private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
    public bool Equals(IPathwayModule x, IPathwayModule y)
    {
        if (x == y) return true;
        if (x == null || y == null) return false;

        if ((x.Class == null) ^ (y.Class == null)) return false;

        if (x.Class == null) //and implicitly y.Class == null
        {
            return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId);
        }
        return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId) && x.Class.ClassId.Equals(y.Class.ClassId);
    }
    public int GetHashCode(IPathwayModule obj)
    {
        unchecked {
            int h = obj.Block ^ obj.Module.ModuleId.GetHashCode() ^ (int) obj.Status;
            if (obj.Class != null)
            {
               h ^= obj.Class.ClassId.GetHashCode();
            }
            return h;
        }
    }
}

OTHER TIPS

Don't do the Equals in terms of the Hash function's results it's too fragile. Rather do a field value comparison for each of the fields. Something like:

return x != null && y != null && x.Name.Equals(y.Name) && x.Type.Equals(y.Type) ...

Also, the hash functions results aren't really amenable to addition. Try using the ^ operator instead.

return obj.Name.GetHashCode() ^ obj.Type.GetHashCode() ...

You don't need the null check in GetHashCode. If that value is null, you've got bigger problems, no use trying to recover from something over which you have no control...

The only big problem is the implementation of Equals. Hash codes are not unique, you can get the same hash code for objects which are different. You should compare each field of IPathwayModule individually.

GetHashCode() can be improved a bit. You don't need to call GetHashCode() on an int. The int itself is a good hash code. The same for enum values. Your GetHashCode could be then implemented like this:

public int GetHashCode(IPathwayModule obj)
{
    unchecked {
        int h = obj.Block + obj.Module.ModeleId.GetHashCode() + (int) obj.Status;
        if (obj.class != null)
           h += obj.Class.ClassId.GetHashCode();
        return h;
    }
}

The 'unchecked' block is necessary because there may be overflows in the arithmetic operations.

You shouldn't use GetHashCode() as the main way of comparison objects. Compare it field-wise.

There could be multiple objects with the same hash code (this is called 'hash code collisions').

Also, be careful when add together multiple integer values, since you can easily cause an OverflowException. Use 'exclusive or' (^) to combine hashcodes or wrap code into 'unchecked' block.

You should implement better versions of Equals and GetHashCode.

For instance, the hash code of enums is simply their numerical value.

In other words, with these two enums:

public enum A { x, y, z }
public enum B { k, l, m }

Then with your implementation, the following value type:

public struct AB {
    public A;
    public B;
}

the following two values would be considered equal:

AB ab1 = new AB { A = A.x, B = B.m };
AB ab2 = new AB { A = A.z, B = B.k };

I'm assuming you don't want that.

Also, passing the value types as interfaces will box them, this could have performance concerns, although probably not much. You might consider making the IEqualityComparer implementation take your value types directly.

  1. Assuming that two objects are equal because their hash code is equal is wrong. You need to compare all members individually
  2. It is proabably better to use ^ rather than + to combine the hash codes.

If I understand you well, you'd like to hear some comments on your code. Here're my remarks:

  1. GetHashCode should be XOR'ed together, not added. XOR (^) gives a better chance of preventing collisions
  2. You compare hashcodes. That's good, but only do this if the underlying object overrides the GetHashCode. If not, use properties and their hashcodes and combine them.
  3. Hash codes are important, they make a quick compare possible. But if hash codes are equal, the object can still be different. This happens rarely. But you'll need to compare the fields of your object if hash codes are equal.
  4. You say your value types are immutable, but you reference objects (.Class), which are not immutable
  5. Always optimize comparison by adding reference comparison as first test. References unequal, the objects are unequal, then the structs are unequal.

Point 5 depends on whether the you want the objects that you reference in your value type to return not equal when not the same reference.

EDIT: you compare many strings. The string comparison is optimized in C#. You can, as others suggested, better use == with them in your comparison. For the GetHashCode, use OR ^ as suggested by others as well.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top