Question

I have this function to repeat a sequence:

public static List<T> Repeat<T>(this IEnumerable<T> lst, int count)
{
    if (count < 0)
        throw new ArgumentOutOfRangeException("count");

    var ret = Enumerable.Empty<T>();

    for (var i = 0; i < count; i++)
        ret = ret.Concat(lst);

    return ret.ToList();
}

Now if I do:

var d = Enumerable.Range(1, 100);
var f = d.Select(t => new Person()).Repeat(10); 
int i = f.Distinct().Count();

I expect i to be 100, but its giving me 1000! My question strictly is why is this happening? Shouldn't Linq be smart enough to figure out that it's the first selected 100 persons I need to concatenate with variable ret? I'm getting a feeling that here the Concat is being given preference when it's used with a Select when its executed at ret.ToList()..

Edit:

If I do this I get the correct result as expected:

var f = d.Select(t => new Person()).ToList().Repeat(10); 
int i = f.Distinct().Count(); //prints 100

Edit again:

I have not overridden Equals. I'm just trying to get 100 unique persons (by reference of course). My question is can someone elucidate to me why is Linq not doing the select operation first and then concatenation (of course at the time of execution)?

Was it helpful?

Solution

The problem is that unless you call ToList, the d.Select(t => new Person()) is re-enumerated each time the Repeat goes through the list, creating duplicate Persons. The technique is known as the deferred execution.

In general, LINQ does not assume that each time it enumerates a sequence it would get the same sequence, or even a sequence of the same length. If this effect is not desirable, you can always "materialize" the sequence inside your Repeat method by calling ToList right away, like this:

public static List<T> Repeat<T>(this IEnumerable<T> lstEnum, int count) {
    if (count < 0)
        throw new ArgumentOutOfRangeException("count");

    var lst = lstEnum.ToList(); // Enumerate only once
    var ret = Enumerable.Empty<T>();

    for (var i = 0; i < count; i++)
        ret = ret.Concat(lst);

    return ret.ToList();
}

OTHER TIPS

I could break down my problem to something less trivial:

var d = Enumerable.Range(1, 100);
var f = d.Select(t => new Person());

Now essentially I am doing this:

f = f.Concat(f);

Mind you query hasn't been executed till now. At the time of execution f is still d.Select(t => new Person()) unexecuted. So the last statement at the time of execution can broken down to:

f = f.Concat(f); 
//which is 
f = d.Select(t => new Person()).Concat(d.Select(t => new Person()));

which is obvious to create 100 + 100 = 200 new instances of persons. So

f.Distinct().ToList(); //yields 200, not 100

which is the correct behaviour.

Edit: I could rewrite the extension method as simple as,

public static IEnumerable<T> Repeat<T>(this IEnumerable<T> source, int times)
{
    source = source.ToArray();
    return Enumerable.Range(0, times).SelectMany(_ => source);
}

I used dasblinkenlight's suggestion to fix the issue.

Each Person object is a separate object. All 1000 are distinct.

What is the definition of equality for the Person type? If you don't override it, that definition will be reference equality, meaning all 1000 objects are distinct.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top