Question

Background

The background for this is that I had a recent conversation in the comments with another clearly knowledgeable user about how LINQ is compiled. I first "summarized" and said LINQ was compiled to a for loop. While this isn't correct, my understanding from other stacks such as this one is that the LINQ query is compiled to a lambda with a loop inside of it. This is then called when the variable is enumerated for the first time (after which the results are stored). The other user said that LINQ takes additional optimizations such as hashing. I couldn't find any supporting documentation either for or against this.

I know this seems like a really obscure point but I have always felt that if I don't understand how something works completely, its going to be difficult to understand why I'm not using it correctly.

The Question

So, lets take the following very simple example:

var productNames = 
    from p in products 
    where p.Id > 100 and p.Id < 5000
    select p.ProductName;

What is this statement actually compiled to in CLR? What optimizations does LINQ take over me just writing a function that manually parses the results? Is this just semantics or is there more to it than that?

Clarification

Clearly I'm asking this question because I don't understand what the inside of the LINQ "black box" looks like. Even though I understand that LINQ is complicated (and powerful), I'm mostly looking for a basic understanding of either the CLR or a functional equivalent to a LINQ statement. There are great sites out there for helping understand how to create a LINQ statement but very few of these seem to give any guidance on how those are actually compiled or run.

Side Note - I will absolutely read through the John Skeet series on linq to objects.

Side Note 2 - I shouldn't have tagged this as LINQ to SQL. I understand how ORM's and micro-ORM's work. That is really besides the point of the question.

Was it helpful?

Solution

For LINQ to Objects, this is compiled into a set of static method calls:

var productNames = 
    from p in products 
    where p.Id > 100 and p.Id < 5000
    select p.ProductName;

Becomes:

IEnumerable<string> productNames = products
                                       .Where(p => p.Id > 100 and p.Id < 5000)
                                       .Select(p => p.ProductName);

This uses extension methods defined in the Enumerable type, so is actually compiled to:

IEnumerable<string> productNames = 
     Enumerable.Select(
        Enumerable.Where(products, p => p.Id > 100 and p.Id < 5000),
        p => p.ProductName
     );

The lambda expressions to handle this are turned into methods by the compiler. The lambda in the where is turned into a method which can be set to a Func<Product, Boolean>, and the select into a Func<Product, String>.

For a thorough explanation, see Jon Skeet's blog series: Reimplementing LINQ to Objects. He walks through the entire process of how this works, including the compiler transformations (from query syntax to method calls), how the methods are implemented, etc.

Note that LINQ to Sql and IQueryable<T> implementations are different. The Expression<T> that is generated by the lambda is passed into the query provider, which in turn is "transformed" in some manner (it's up to the provider how to do this) into calls, typically run on the server in the case of an ORM.


For this method, for example:

    private static IEnumerable<string> ProductNames(IEnumerable<Product> products)
    {
        var productNames =
            from p in products
            where p.Id > 100 && p.Id < 5000
            select p.ProductName;
        return productNames;
    }

Gets compiled to the following IL:

  .method private hidebysig static class [mscorlib]System.Collections.Generic.IEnumerable`1<string> ProductNames(class [mscorlib]System.Collections.Generic.IEnumerable`1<class ConsoleApplication3.Product> products) cil managed
{
    .maxstack 3
    .locals init (
        [0] class [mscorlib]System.Collections.Generic.IEnumerable`1<string> enumerable,
        [1] class [mscorlib]System.Collections.Generic.IEnumerable`1<string> enumerable2)
    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: ldsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, bool> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate3
    L_0007: dup 
    L_0008: brtrue.s L_001d
    L_000a: pop 
    L_000b: ldnull 
    L_000c: ldftn bool ConsoleApplication3.Program::<ProductNames>b__2(class ConsoleApplication3.Product)
    L_0012: newobj instance void [mscorlib]System.Func`2<class ConsoleApplication3.Product, bool>::.ctor(object, native int)
    L_0017: dup 
    L_0018: stsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, bool> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate3
    L_001d: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0> [System.Core]System.Linq.Enumerable::Where<class ConsoleApplication3.Product>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>, class [mscorlib]System.Func`2<!!0, bool>)
    L_0022: ldsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, string> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate5
    L_0027: dup 
    L_0028: brtrue.s L_003d
    L_002a: pop 
    L_002b: ldnull 
    L_002c: ldftn string ConsoleApplication3.Program::<ProductNames>b__4(class ConsoleApplication3.Product)
    L_0032: newobj instance void [mscorlib]System.Func`2<class ConsoleApplication3.Product, string>::.ctor(object, native int)
    L_0037: dup 
    L_0038: stsfld class [mscorlib]System.Func`2<class ConsoleApplication3.Product, string> ConsoleApplication3.Program::CS$<>9__CachedAnonymousMethodDelegate5
    L_003d: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!1> [System.Core]System.Linq.Enumerable::Select<class ConsoleApplication3.Product, string>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>, class [mscorlib]System.Func`2<!!0, !!1>)
    L_0042: stloc.0 
    L_0043: ldloc.0 
    L_0044: stloc.1 
    L_0045: br.s L_0047
    L_0047: ldloc.1 
    L_0048: ret 
}

Note that these are normal call instructions for the method calls. The lambdas get converted into other methods, such as:

[CompilerGenerated]
private static bool <ProductNames>b__2(Product p)
{
    return ((p.Id > 100) && (p.Id < 0x1388));
}

OTHER TIPS

Query syntax is just syntactic sugar for method syntax, it effectively gets compiled to this:

var productNames = Products().Where(p => p.Id > 100 && p.Id < 5000).Select(p => productName);

Now what those functions actually do depends on which flavour of LINQ you're using e.g. Linq to Objects (which chains together in-memory handlers) or Linq to SQL (which converts it an SQL query) etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top