Questions about IEqualityComparer<T> / List<T>.Distinct()
-
18-03-2021 - |
Question
Here is the equality comparer I just wrote because I wanted a distinct set of items from a list containing entities.
class InvoiceComparer : IEqualityComparer<Invoice>
{
public bool Equals(Invoice x, Invoice y)
{
// A
if (Object.ReferenceEquals(x, y)) return true;
// B
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null)) return false;
// C
return x.TxnID == y.TxnID;
}
public int GetHashCode(Invoice obj)
{
if (Object.ReferenceEquals(obj, null)) return 0;
return obj.TxnID2.GetHashCode();
}
}
- Why does
Distinct
require a comparer as opposed to aFunc<T,T,bool>
? - Are (A) and (B) anything other than optimizations, and are there scenarios when they would not act the expected way, due to subtleness in comparing references?
If I wanted to, could I replace (C) with
return GetHashCode(x) == GetHashCode(y)
Solution
- So it can use hashcodes to be O(n) as opposed to O(n2)
- (A) is an optimization.
(B) is necessary; otherwise, it would throw anNullReferenceException
. IfInvoice
is a struct, however, they're both unnecessary and slower. - No. Hashcodes are not unique
OTHER TIPS
A
is a simple and quick way to ensure that both objects located at the same memory address so both references the same object.B
- if one of the references is null - obviuosly it does not make any sense doing equality comparisionC
- no, sometimes GetHashCode() can return the same value for different objects (hash collision) so you should do equality comparison
Regarding the same hash code value for different objects, MSDN:
If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.
Distinct() basically works on the term "not equal". therefore, if your list contains non-primitiv types, you must implement your own EqualityComparer.
At A, you check out whether the objects are identical or not. If two objects are equal, they don't have to be identical, but if they are identical, you can be sure that they are equal. So the A part can increase the method's effectivity in some cases.