Question

I can retrieve duplicate by using this query:

var duplicates = grpDupes
    .GroupBy(i => new { i.Email })
    .Where(g => g.Count() > 1)
    .SelectMany(g => g);

But i am interested to find duplicates by using either Email or Address or xyz. If i modify above query

GroupBy(i => new { i.Email, i.Address }) 

then it becomes AND condition, any help?

Was it helpful?

Solution 2

You could use EXISTS in SQL which is Any in LINQ:

var duplicates = grpDupes
    .Where(i => (i.Email.Trim() != "" || i.Address.Trim() != "")  && grpDupes
        .Any(i2 => i.ID != i2.ID && 
            ((i.Email.Trim()   != "" && i.Email   == i2.Email) || 
             (i.Address.Trim() != "" && i.Address == i2.Address))));

Note that i've used ID as the primary key column. If you don't have one you need to use the column(s) that you want to use as identifier.

If you use as database driven LINQ provider like LINQ-To-SQL or LINQ-To-Entities this is efficient.

OTHER TIPS

You have to use the overloaded method which accepts an EqualityComparer.

    /// <summary>
/// Factory class which creates an EqualityComparer based on lambda expressions.
/// </summary>
/// <typeparam name="T">The type of which a new equality comparer is to be created.</typeparam>
public static class EqualityComparerFactory<T>
{
    private class MyComparer : IEqualityComparer<T>
    {
        private readonly Func<T, int> _getHashCodeFunc;
        private readonly Func<T, T, bool> _equalsFunc;

        public MyComparer(Func<T, T, bool> equalsFunc, Func<T, int> getHashCodeFunc = null)
        {
            _getHashCodeFunc = getHashCodeFunc ?? (a=>0);
            _equalsFunc = equalsFunc;
        }

        public bool Equals(T x, T y)
        {
            return _equalsFunc(x, y);
        }

        public int GetHashCode(T obj)
        {
            return _getHashCodeFunc(obj);
        }
    }

    /// <summary>
    /// Creates an <see cref="IEqualityComparer{T}" /> based on an equality function and optionally on a hash function.
    /// </summary>
    /// <param name="equalsFunc">The equality function.</param>
    /// <param name="getHashCodeFunc">The hash function.</param>
    /// <returns>
    /// A typed Equality Comparer.
    /// </returns>
    public static IEqualityComparer<T> CreateComparer(Func<T, T, bool> equalsFunc, Func<T, int> getHashCodeFunc = null)
    {
        ArgumentValidator.NotNull(() => equalsFunc);

        return new MyComparer(equalsFunc, getHashCodeFunc);
    }
}

Sample Usage:

        var comparer = EqualityComparerFactory<YourClassHere>.CreateComparer((a, b) => a.Address == b.Address || a.Email == b.Email);

        data.GroupBy(a => a, comparer);

I'd keep this very simple by using .ToLookup().

How about this?

var emailLookup = grpDupes.ToLookup(x => x.Email);
var addressLookup = grpDupes.ToLookup(x => x.Address);

var duplicates = grpDupes
    .Where(x =>
        emailLookup[x.Email].Count() > 1 || addressLookup[x.Address].Count() > 1);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top