Question

In short, I am looking for guidance on which of the following two methods should be preferred (and why):

static IEnumerable<T> DistinctA<T>(this IEnumerable<T> xs)
{
    return new HashSet<T>(xs);
}

static IEnumerable<T> DistinctB<T>(this IEnumerable<T> xs) where T : IEquatable<T>
{
    return new HashSet<T>(xs);
}
  • Argument in favour of DistinctA: Obviously, the constraint on T is not required, because HashSet<T> does not require it, and because instances of any T are guaranteed to be convertible to System.Object, which provides the same functionality as IEquatable<T> (namely the two methods Equals and GetHashCode). (While the non-generic methods will cause boxing with value types, that's not what I'm concerned about here.)

  • Argument in favour of DistinctB: The generic parameter constraint, while not strictly necessary, makes visible to callers that the method will compare instances of T, and is therefore a signal that Equals and GetHashCode should work correctly for T. (After all, defining a new type without explicitly implementing Equals and GetHashCode happens very easily, so the constraint might help catch some errors early.)

Question: Is there an established and documented best practice that recommends when to specify this particular constraint (T : IEquatable<T>), and when not to? And if not, is one of the above arguments flawed in any way? (In that case, I'd prefer well-thought-out arguments, not just personal opinions.)

Was it helpful?

Solution 2

Start by considering when it might matter which of the two mechanisms is used; I can think of only two:

  1. When the code is being translated to another language (either a subsequent version of C#, or a related language like Java, or a completly dissimilar language such as Haskell). In this case the second definition is clearly better by providing the translator, whether automated or manual, with more information.
  2. When a user unfamiliar with the code is reading it to learn how to invoke the method. Again, I believe the second is clearly better by providing more information readily to such a user.

I cannot think of any circumstance in which the fist definition would be preferred, and where it actually matters beyond personal preference.

Others thoughts?

OTHER TIPS

While my comment at Pieter's answer is fully true, I've rethought the exact case of Distinct that you refer to.

This is a LINQ method contract, not just a method.

LINQ is meant to be a common fascade implemented by various providers. Linq2Objects may require an IEquatable, Linq2Sql may require IEquatable too, but Linq2Sql may not require and even not use at all and completely ignore the IEquatableness as the comparison is made by the DB SQL engine.

Therefore, at the layer of LINQ method definitions, it does not make sense to specify the requirement for IEquatable. It would limit and constrain the future LINQ providers to some things they really do not need to care for in their specific domains, and note that LINQ is actually very often all about domain-specificness, as very often the LINQ expressions and parameters are never actually run as code, but they are analyzed and retranslated to other constructs like SQL or XPaths.. Ommission of constraints in this case is reasonable, because you cannot really know what your future-and-unknown-domain-provider will really need to request from the caller. Linq-to-Mushrooms may need to use an IToxinEquality instead of IEquatable!

However, when you are designing interfaces/contracts that will clearly be used as runnable code (not just expression trees that will only work as configuration or declarations), then I do not see no valid point for not providing the constraints.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top