Question

If I have two collections of type T, and an IEqualityComparer that compares a subset of their properties, which collection would the resulting elements of an Intersect or Union come from?

The tests I've run so far suggest the following:

  • the item(s) from col1 win
  • if col1 or col2 contain duplicate items (as defined by the comparer) within themselves, the first entry (within col1, then col2) wins.

I'm aware this shouldn't be an issue, as (by definition) I should be viewing the resulting objects as equal. It just occurred to me that using Union with a custom comparer could be a bit neater than the equivalent Join - though this only holds true if the above assumptions are guaranteeed.

    class DummyComparer : IEqualityComparer<Dummy>
    {
        public bool Equals(Dummy x, Dummy y)
        {
            return x.ID == y.ID;
        }

        public int GetHashCode(Dummy obj)
        {
            return obj.ID.GetHashCode();
        }
    }

    class Dummy
    {
        public int ID { get; set; }
        public string Name { get; set; }
    }

    [Test]
    public void UnionTest()
    {
        var comparer = new DummyComparer();

        var d1 = new Dummy { ID = 0, Name = "test0" };
        var d2 = new Dummy { ID = 0, Name = "test1" };
        var d3 = new Dummy { ID = 1, Name = "test2" };
        var d4 = new Dummy { ID = 1, Name = "test3" };

        var col1 = new Dummy[] { d1, d3 };
        var col2 = new Dummy[] { d2, d4 };

        var x1 = col1.Union(col2, comparer).ToList();
        var x2 = col2.Union(col1, comparer).ToList();

        var y1 = col1.Except(col2, comparer).ToList();
        var y2 = col2.Except(col1, comparer).ToList();

        var z1 = col1.Intersect(col2, comparer).ToList();
        var z2 = col2.Intersect(col1, comparer).ToList();

        Assert.AreEqual(2, x1.Count);
        Assert.Contains(d1, x1);
        Assert.Contains(d3, x1);

        Assert.AreEqual(2, x2.Count);
        Assert.Contains(d2, x2);
        Assert.Contains(d4, x2);

        Assert.AreEqual(0, y1.Count);
        Assert.AreEqual(0, y2.Count);

        Assert.AreEqual(2, z1.Count);
        Assert.Contains(d1, z1);
        Assert.Contains(d3, z1);

        Assert.AreEqual(2, z2.Count);
        Assert.Contains(d2, z2);
        Assert.Contains(d4, z2);
    }
Was it helpful?

Solution

The first collection should win always.

MSDN:

When the object returned by this method is enumerated, Union enumerates first and second in that order and yields each element that has not already been yielded.

Here is the implementation of Union(ILSPY, .NET 4), the first collection is enumerated first:

// System.Linq.Enumerable
private static IEnumerable<TSource> UnionIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
    Set<TSource> set = new Set<TSource>(comparer);
    foreach (TSource current in first)
    {
        if (set.Add(current))
        {
            yield return current;
        }
    }
    foreach (TSource current2 in second)
    {
        if (set.Add(current2))
        {
            yield return current2;
        }
    }
    yield break;
}

The same applies to Intersect (and other similar methods in Linq-To-Objects as well):

When the object returned by this method is enumerated, Intersect enumerates first, collecting all distinct elements of that sequence. It then enumerates second, marking those elements that occur in both sequences. Finally, the marked elements are yielded in the order in which they were collected.

Update: As Rawling has mentioned in his comment MSDN lies at the documentation of Intersect. I've looked at Intersect with ILSpy and it enumerates the second collection first and only then the first, even if is documented the other way around.

Actually Jon Skeet has also mentioned this "lie" in EduLinq: http://msmvps.com/blogs/jon_skeet/archive/2010/12/30/reimplementing-linq-to-objects-part-16-intersect-and-build-fiddling.aspx (in his words: "This is demonstrably incorrect.")

However, even if it isn't implemented as expected it will still return the element of the first collection as you can see in the implementation:

// System.Linq.Enumerable
private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
    Set<TSource> set = new Set<TSource>(comparer);
    foreach (TSource current in second)
    {
        set.Add(current);
    }
    foreach (TSource current2 in first)
    {
        if (set.Remove(current2))
        {
            yield return current2;
        }
    }
    yield break;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top