Anonymous methods vs. lambda expression [duplicate]

Question 1

What is an anonymous method? Is it really anonymous? Does it have a name? All good questions, so let's start with them and work our way up to lambda expressions as we go along.

When you do this:

public void TestSomething()
{
    Test(delegate { Debug.WriteLine("Test"); });
}

What actually happens?

The compiler first decides to take the "body" of the method, which is this:

Debug.WriteLine("Test");

and separate out that into a method.

Two questions the compiler now has to answer:

Where should I put the method?
What should the signature of the method look like?

The second question is easily answered. The delegate { part answers that. The method takes no parameters (nothing between delegate and {), and since we don't care about its name (hence the "anonymous" part), we can declare the method as such:

public void SomeOddMethod()
{
    Debug.WriteLine("Test");
}

But why did it do all this?

Let's look at what a delegate, such as Action really is.

A delegate is, if we for a moment disregard the fact that delegates in .NET are actually linked list of multiple single "delegates", a reference (pointer) to two things:

An object instance
A method on that object instance

So, with that knowledge, the first piece of code could actually be rewritten as this:

public void TestSomething()
{
    Test(new Action(this.SomeOddMethod));
}

private void SomeOddMethod()
{
    Debug.WriteLine("Test");
}

Now, the problem with this is that the compiler has no way of knowing what Test actually does with the delegate it is given, and since one half of the delegate is a reference to the instance on which the method is to be called, this in the above example, we don't know how much data will be referenced.

For instance, consider if the above code was part of a really huge object, but an object that only live temporarily. Also consider that Test would store that delegate somewhere it would live for a long time. That "long time" would tie itself up to the life of that huge object as well, keeping a reference to that for a long time as well, probably not good.

So the compiler does more than just create a method, it also creates a class to hold it. This answers the first question, where should I put it?.

The code above can thus be rewritten as follows:

public void TestSomething()
{
    var temp = new SomeClass;
    Test(new Action(temp.SomeOddMethod));
}

private class SomeClass
{
    private void SomeOddMethod()
    {
        Debug.WriteLine("Test");
    }
}

That is, for this example, what an anonymous method is really all about.

Things get a bit more hairy if you start using local variables, consider this example:

public void Test()
{
    int x = 10;
    Test(delegate { Debug.WriteLine("x=" + x); });
}

This is what happens under the hood, or at least something very close to it:

public void TestSomething()
{
    var temp = new SomeClass;
    temp.x = 10;
    Test(new Action(temp.SomeOddMethod));
}

private class SomeClass
{
    public int x;

    private void SomeOddMethod()
    {
        Debug.WriteLine("x=" + x);
    }
}

The compiler creates a class, lifts all the variables that the method requires into that class, and rewrites all access to the local variables to be access to fields on the anonymous type.

The name of the class, and the method, are a bit odd, let's ask LINQPad what it would be:

void Main()
{
    int x = 10;
    Test(delegate { Debug.WriteLine("x=" + x); });
}

public void Test(Action action)
{
    action();
}

If I ask LINQPad to output the IL (Intermediate Language) of this program, I get this:

// var temp = new UserQuery+<>c__DisplayClass1();
IL_0000:  newobj      UserQuery+<>c__DisplayClass1..ctor
IL_0005:  stloc.0     // CS$<>8__locals2
IL_0006:  ldloc.0     // CS$<>8__locals2

// temp.x = 10;
IL_0007:  ldc.i4.s    0A 
IL_0009:  stfld       UserQuery+<>c__DisplayClass1.x

// var action = new Action(temp.<Main>b__0);
IL_000E:  ldarg.0     
IL_000F:  ldloc.0     // CS$<>8__locals2
IL_0010:  ldftn       UserQuery+<>c__DisplayClass1.<Main>b__0
IL_0016:  newobj      System.Action..ctor

// Test(action);
IL_001B:  call        UserQuery.Test

Test:
IL_0000:  ldarg.1     
IL_0001:  callvirt    System.Action.Invoke
IL_0006:  ret         

<>c__DisplayClass1.<Main>b__0:
IL_0000:  ldstr       "x="
IL_0005:  ldarg.0     
IL_0006:  ldfld       UserQuery+<>c__DisplayClass1.x
IL_000B:  box         System.Int32
IL_0010:  call        System.String.Concat
IL_0015:  call        System.Diagnostics.Debug.WriteLine
IL_001A:  ret         

<>c__DisplayClass1..ctor:
IL_0000:  ldarg.0     
IL_0001:  call        System.Object..ctor
IL_0006:  ret

Here you can see that the name of the class is UserQuery+<>c__DisplayClass1, and the name of the method is <Main>b__0. I edited in the C# code that produced this code, LINQPad doesn't produce anything but the IL in the example above.

The less-than and greater-than signs are there to ensure that you cannot by accident create a type and/or method that matches what the compiler produced for you.

So that's basically what an anonymous method is.

So what is this?

Test(() => Debug.WriteLine("Test"));

Well, in this case it's the same, it's a shortcut for producing an anonymous method.

You can write this in two ways:

() => { ... code here ... }
() => ... single expression here ...

In its first form you can write all the code you would do in a normal method body. In its second form you're allowed to write one expression or statement.

However, in this case the compiler will treat this:

() => ...

the same way as this:

delegate { ... }

They're still anonymous methods, it's just that the () => syntax is a shortcut to getting to it.

So if it's a shortcut to getting to it, why do we have it?

Well, it makes life a bit easier for the purpose of which it was added, which is LINQ.

Consider this LINQ statement:

var customers = from customer in db.Customers
                where customer.Name == "ACME"
                select customer.Address;

This code is rewritten as follows:

var customers =
    db.Customers
      .Where(customer => customer.Name == "ACME")
      .Select(customer => customer.Address");

If you were to use the delegate { ... } syntax, you would have to rewrite the expressions with return ... and so on, and they'd look more funky. The lambda syntax was thus added to make life easier for us programmers when writing code like the above.

So what are expressions?

So far I have not shown how Test has been defined, but let's define Test for the above code:

public void Test(Action action)

This should suffice. It says that "I need a delegate, it is of type Action (taking no parameters, returning no values)".

However, Microsoft also added a different way to define this method:

public void Test(Expression<Func<....>> expr)

Note that I dropped a part there, the .... part, let's get back to that ¹.

This code, paired with this call:

Test(() => x + 10);

will not actually pass in a delegate, nor anything that can be called (immediately). Instead, the compiler will rewrite this code to something similar (but not at all like) the below code:

var operand1 = new VariableReferenceOperand("x");
var operand2 = new ConstantOperand(10);
var expression = new AdditionOperator(operand1, operand2);
Test(expression);

Basically the compiler will build up an Expression<Func<...>> object, containing references to the variables, the literal values, the operators used, etc. and pass that object tree to the method.

Why?

Well, consider the db.Customers.Where(...) part above.

Wouldn't it be nice if, instead of downloading all customers (and all their data) from the database to the client, looping through them all, finding out which customer has the right name, etc. the code would actually ask the database to find that single, correct, customer at once?

That's the purpose behind expression. The Entity Framework, Linq2SQL, or any other such LINQ-supporting database layer, will take that expression, analyze it, pick it apart, and write up a properly formatted SQL to be executed against the database.

This it could never do if we were still giving it delegates to methods containing IL. It can only do this because of a couple of things:

The syntax allowed in a lambda expression suitable for an Expression<Func<...>> is limited (no statements, etc.)
The lambda syntax without the curly brackets, which tells the compiler that this is a simpler form of code

So, let's summarize:

Anonymous methods are really not all that anonymous, they end up as a named type, with a named method, only you do not have to name those things yourself
It's a lot of compiler magic under the hood that moves things around so that you don't have to
Expressions and Delegates are two ways to look at some of the same things
Expressions are meant for frameworks that wants to know what the code does and how, so that they can use that knowledge to optimize the process (like writing a SQL statement)
Delegates are meant for frameworks that are only concerned about being able to call the method

Footnotes:

The .... part for such a simple expression is meant for the type of return value you get from the expression. The () => ... simple expression ... only allows expressions, that is, something that returns a value, and it cannot be multiple statements. As such, a valid expression type is this: Expression<Func<int>>, basically, the expression is a function (method) returning an integer value.

Note that the "expression that returns a value" is a limit for Expression<...> parameters or types, but not of delegates. This is entirely legal code if the parameter type of Test is an Action:
```
Test(() => Debug.WriteLine("Test"));
```
Obviously, Debug.WriteLine("Test") doesn't return anything, but this is legal. If the method Test required an expression however, it would not be, as an expression must return a value.

Question 2

There is one subtle difference you should be aware of. Consider the following queries (using the proverbial NorthWind).

Customers.Where(delegate(Customers c) { return c.City == "London";});
Customers.Where(c => c.City == "London");

The first one uses an anonymous delegate and the second one uses a lambda expression. If you evaluate the results of both you will see the same thing. However, looking at the generated SQL, we'll see quite a different tale. The first generates

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [Customers] AS [t0]

Whereas the second generates

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [Customers] AS [t0]
WHERE [t0].[City] = @p0

Notice in the first case, the where clause is not passed to the database. Why is this? The compiler is able to determine that the lambda expression is a simple single-line expression which can be retained as an expression tree, whereas the anonymous delegate is not a lambda expression and thus not wrappable as an Expression<Func<T>>. As a result in the first case, the best match for the Where extension method is the one that extends IEnumerable rather than the IQueryable version which requires an Expression<Func<T, bool>>.

At this point, there's little use to the anonymous delegate. It's more verbose and less flexible. In general, I would recommend always using the lambda syntax over the anonymous delegate syntax and pick up parsability and syntactic terseness as well.

Question 3

To be precise, what you name 'anonymous delegate' is actually an anonymous method.

Well, both lambdas and anonymous methods are just syntax sugar. The compiler will generate at least a 'normal' method for you, although sometimes (in the case of a closure) it will generate a nested class with the no-longer-anonymous method in it.