Question

I have a L2E query that returns some data that contains duplicate objects. I need to remove those duplicate objects. Basically I should assume that if their IDs are the same then the objects are duplicate. I've tried q.Distinct(), but that still returned duplicate objects. Then I've tried implementing my own IEqualityComparer and passing it to the Distinct() method. The method failed with following text:

LINQ to Entities does not recognize the method 'System.Linq.IQueryable1[DAL.MyDOClass] Distinct[MyDOClass](System.Linq.IQueryable1[DAL.MyDOClass], System.Collections.Generic.IEqualityComparer`1[DAL.MyDOClass])' method, and this method cannot be translated into a store expression.

And here is the implementation of EqualityComparer:

  internal class MyDOClassComparer: EqualityComparer<MyDOClass>
    {
        public override bool Equals(MyDOClass x, MyDOClass y)
        {
            return x.Id == y.Id;
        }

        public override int GetHashCode(MyDOClass obj)
        {
            return obj == null ? 0 : obj.Id;
        }
    }

So how do I write my own IEqualityComparer properly?

Was it helpful?

Solution

An EqualityComparer is not the way to go - it can only filter your result set in memory eg:

var objects = yourResults.ToEnumerable().Distinct(yourEqualityComparer);

You can use the GroupBy method to group by IDs and the First method to let your database only retrieve a unique entry per ID eg:

var objects = yourResults.GroupBy(o => o.Id).Select(g => g.First());

OTHER TIPS

rich.okelly and Ladislav Mrnka are both correct in different ways.

Both their answers deal with the fact that the IEqualityComparer<T>'s methods won't be translated to SQL.

I think it's worth looking at the pros and cons of each, which will take a bit more than a comment.

rich's approach re-writes the query to a different query with the same ultimate result. Their code should result in more or less how you would efficiently do this with hand-coded SQL.

Ladislav's pulls it out of the database at the point before the distinct, and then an in-memory approach will work.

Since the database is great at doing the sort of grouping and filtering rich's depends upon, it will likely be the most performant in this case. You could though find that the complexity of what's going on prior to this grouping is such that Linq-to-entities doesn't nicely generate a single query but rather produces a bunch of queries and then does some of the work in-memory, which could be pretty nasty.

Generally grouping is more expensive than distinct on in-memory cases (especially if you bring it into memory with AsList() rather than AsEnumerable()). So if either you were already going to bring it into memory at this stage due to some other requirement, it would be more performant.

It would also be the only choice if your equality definition was something that didn't relate well to what is available just in the database, and of course it allows you to switch equality definitions if you wanted to do so based on an IEqualityComparer<T> passed as a parameter.

In all, rich's is the answer I'd say would be most-likely to be the best choice here, but the different pros and cons to Ladislav's compared to rich's makes it also well worth studying and considering.

You will not. Distinct operator is called on the database so any code you write in your application cannot be used (you cannot move your equality comparator logic to SQL) unless you are happy with loading all non-distinct values and make distinct filtering in your application.

var query = (from x in context.EntitySet where ...).ToList()
                                                   .Distinct(yourComparer);

Late answer but you can do better: if the DAL object is partial (usually is if it is a DB object), you can extend it like this:

public partial class MyDOClass :  IEquatable<MyDOClass>
    {

        public override int GetHashCode()
        {
            return Id == 0 ? 0 : Id;
        }

        public bool Equals(MyDOClass other)
        {
            return this.Id == other.Id;
        }
    }

And the distinct will work without any overload in it.

If not, you can create the IEqualityComparer class like this:

internal class MyDOClassComparer : MyDOClass,  IEquatable<MyDOClass>, IEqualityComparer<MyDOClass>
    {
        public override int GetHashCode()
        {
            return Id == 0 ? 0 : Id;
        }

        public bool Equals(MyDOClass other)
        {
            return this.Id == other.Id;
        }

        public bool Equals(MyDOClass x, MyDOClass y)
        {
            return x.Id == y.Id;
        }

        public int GetHashCode(MyDOClass obj)
        {
            return Id == 0 ? 0 : Id;
        }
    }

And again, use the Distinct without any overload

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top