There's a value to both of the two answers here, and I would give the mark to phoog as answering the practical concern most people have when they ask about this (variants of it have come up before). But there is also an incompleteness.
There are four ways of looking at the code in question, and all four are important, and the answers have only looked at two (though phoog's entailed a lot about one other).
I'll start with the part of the question that was ignored so far:
Also the following code:
int i = 0;`
bool b = i == null; // Always false`
Is there an implicit object cast going on? such that:
int i = 0;
bool b = (object)i == null;
Well, yes and no. It depends on the level we are looking at it, and we actually have to look at it at different levels at different times, so saying so is not mere pedantry.
C# is four things:
- It is a computer language in its own right. We can reason in it and about it and examine whether or not something follows its rules, and what it means according to those rules.
- It is a way of producing CIL, which is another language in its own right, to which the above apply.
- Via CIL, it is a way of producing machine code, either at runtime or through Ngen, which is also a language in its own right.
- It is a way of telling a computer to do something, which is usually the main point of the exercise.
So far answers have looked at point 2 and 3, but the full picture looks at all four.
And the most important points are actually point 1 and 4.
Point 1 is important because C# is after all the language we are looking at, and the view colleagues are most likely to look at. Since programming is partly instructing a computer to do something, and partly expressing one's intent as one did so (medium- and high-level programming languages are for people first, computers second), the actual source code is important.
Point 4 is important because that is after all our final goal. It is not the same thing as looking at the assembly of the machine code (as phoog's answer did) because machine code is not the final answer as to what changes and optimisations are done:
- CPUs do optimisations of their own. This is particularly relevant when branching comes up.
- Two pieces of assembly that when considered purely as a theoretical language are equivalent may differ in how well they treat CPU caches.
- Two pieces of assembly that when considered purely as a theoretical language are equivalent may differ in that one performs unaligned reads that cause performance problems, incorrect results, exceptions or screens-of-death.
- Two pieces of assembly that when considered purely as a theoretical language are equivalent my differ in performance because one uses an instruction that the CPU happens to perform faster than the other's logically-equivalent instruction.
- And so on...
Now, all that said, in the cases we're looking at now, the machine code is about as far as we need to look to reason about the machine's behaviour. In general though, machine code is not the final answer every time. Still, phoog's answer isn't a fault for implying rather than stating the impact here; I only mention it because I'm aiming to write about the different conceptual levels at which both phoog and xxbbcc are correct in different ways.
Coming back to our code of bool b = i == null
where i
is of type int
.
In C# null
is defined as a literal value that is the default for all reference types, and for nullable value types. It can be compared with any value for reference equality - that is, the question "Are X and Y the same instance" can be asked with null
as the value for X and the answer is true if Y is not an instance, and false otherwise.
To make this comparison with a value type, we must box the value type, just as we must any of those cases where we need to treat a value type as a reference type.
If the value type is a nullable value type, and it is null (HasValue
returns false), then boxing produces a null reference. In all other cases boxing a value type creates a reference to a new object on the heap, of type object
which refers to the same value and can be unboxed back to it.
Therefore the answer at the conceptual level of C# is "yes, i is implicitly boxed to create a new object that is then compared to null [which hence will always return false]".
At the next level, we have CIL.
In CIL, null is a value with a natural word-size (32-bit in a 32-bit process, 64-bit in a 64-bit process) bit pattern of all-zero (hence brfalse
, brzero
and brnull
all just being aliases for the same bytecode) which is a valid value for managed pointers, pointers, natural integers and any other means to give an address).
Also in CIL, boxing is done to an equivalent boxed type; it's not just object
, but boxed type of int
, boxed type of float
, etc. This is hidden from C# because it's not very useful (you can't do anything with these types other than those things you can do on object
and unbox back to the equivalent unboxed type), but is more precisely defined in CIL because it needs to do the implementation of "how can boxing be done on lots of different types?".
The equivalent code in CIL would at a minimum be:
ldc.i4.0 // push the value 0 onto the stack.
box [mscorlib]System.Int32 // pop the top value from the stack, box it as boxed Int32,
// and push that boxed value onto the stack.
ldnull // push null (all zeros) onto the stack
ceq // pop the top two values onto the stack, if they are equal
// push 1 onto the stack, otherwise push 0 onto the stack.
//Instructions that actually act on "b" here, probably a stloc so it can be loaded as needed later.
I say "at a minimum" as there might be some loading from and storing to the locals array for the method in question.
So, at the CIL level the answer is also "yes, i is implicitly boxed to create a new object that is then compared to null [which hence will always return false]".
However, this is not actually the CIL that would be produced. In a release build it would be:
ldc.i4.0 // push the value 0 onto the stack.
//Instructions that actually act on "b" here, probably a stloc so it can be loaded as needed later.
That is, it will optimise the code that always produces false to code that just produces false. Even in a debug build we would likely have some optimisation.
But I wasn't lying when I said that in CIL the code for comparing an integer with null involves boxing; it does, but the C# compiler can see that this code is a waste of time, and just replaces it with code that loads false into b
. Indeed, if b
isn't used later on, it might just cut out the whole thing. (Conversely, if i
is used later on, it will still load 0
into it at some point, rather than cut it out as in the example above).
This is the first time we've come up against compiler optimisation here, and it's time to examine just what that means.
Compiler optimisation comes down to a simple observation; if a piece of code can be rewritten as a different piece of code that has the same effects as seen from the outside, but is faster and/or uses less memory and/or results in a smaller executable, then only a moron will complain if you produced the faster/smaller/lighter version instead.
This simple observation becomes complicated by two things. The first is what to do when given the choice between a faster version and a lighter version. Some compilers give options for weighing these choices (most C++ compilers do), but C# does not. The other is what does "as seen from the outside" mean? It used to be simple "any output produced, interactions with other processes, or operations on volatile* variables". It gets a bit more complicated when you have multiple threads, one of which is performing garbage collection, all of which are of course "outside" of each other, in that this makes the number of cases where an optimisation (esp. if it involves reordering) could affect what is observed. Still, none of that applies here.
The C# compiler does not do a lot of optimisation, since the jitter is going to do a lot anyway, so the downside of optimisation (1. all work is a chance for a bug so if you don't do a particular optimisation you won't have a bug related to that optimisation. 2. the more you optimise something the more you can confuse the developer looking at it) becomes more significant if a given optimisation would be done by the next layer anyway.
Still, it does do that optimisation.
Indeed, it will optimise away whole sections. Take the code:
public static void Main(string[] args)
{
int i = 0;
if(i == null)
{
Console.WriteLine("wow");
Console.WriteLine("didn't expect that");
}
else
{
Console.WriteLine("ok");
Console.WriteLine("expected");
}
}
Compile it, then decompile it back into C# and you get:
public static void Main(string[] args)
{
Console.WriteLine("ok");
Console.WriteLine("expected");
}
Because the compiler can remove entire sections of code it knows will never be hit.
So, while in both C# and IL, comparing a value type to null involves boxing, the C# compiler will remove such pointless cruft and no boxing will actually happen. It will also issue warning CS0472, because if you put obviously pointless cruft in your code something was likely wrong with your thinking, and you should look at it and figure out what you really meant to do.
It's worth at this point also looking at what would happen if i
was of type int?
; which can be boxed to a null. There is still an optimisation made:
- Most of the time the boxing and comparison gets replaced by a call to the
HasValue
field. This is more efficient than boxing.
- Sometimes the compiler can (due to knowledge of the value in question) optimise even that away.
(The matter of assembly is irrelevant at this stage, since the boxing and comparison has already been removed).
Now, if we have the case of a generic method (or method of a generic class) that accepts both value and reference type parameters, this optimisation cannot be done by the C# compiler, because generic methods aren't instantiated into their particular specialised form at compile time (unlike the otherwise similar C++ templates), but at jitting time.
For this reason, the IL produced will always include the boxing operation (unless there was another reason why it could be optimised away even in the case of reference types).
The jitter though, has much the same knowledge of the fact that boxing a non-nullable value type will never produce a null value, that the C# compiler did with our first example. It is also much more aggressive in optimisation than the C# compiler ever is.
This is where we get the behaviour that phoog described in their answer: In the code produced for a value-type type parameter, the boxing operation is completely removed (with a reference-type parameter the boxing operation is essentially a no-op and also removed). The check is removed, as the answer is known, and indeed entire sections of code that would be executed only if that check had returned true, are also removed.
The case phoog didn't examine is that of a nullable value type. Here, at a very minimum the boxing and comparison will be replaced with a call to HasValue
, which in turn will be inlined to a read of the internal field in the struct. Possibly (if it's known that the value is never null, or if it's known that it's always null) that will be removed, along with one whole section of code that would never be executed anyway.
Summary
There are two more specific questions behind your question, and you may be interested in one or both of them.
Question 1: I am interested in how C# functions as a language, and I want to know if as far as C# is concerned, comparing a non-nullable value-type with null boxes that value type.
Answer 1: Yes, a comparison with null can only be done with a reference type - including a boxed value type - and so there is always a boxing operation.
Question 2: I have generic code which compares a value with null, because I want to do something only if it's a reference type or nullable value type, and if the value is equal to null. Will my code pay the performance penalty of a boxing operation in the cases where the type compared is a value type?
Answer 2: No. In those cases where the C# compiler cannot optimise away the code from the IL it produces, the jitter still can. For non-nullable value types the entire boxing operation, comparison, and code-path only taken when the comparison with null returned true, will all be removed from the machine code produced, and thus from the work the computer does. Furthermore, if it's a nullable value type, the boxing and comparison will be replaced with an examination of the field in the value that indicates whether HasValue
is true or not.
*Note that this definition of volatile
is related to, but not the same as, that in .NET, for reasons that are also related to how greater support for multi-threaded execution has complicated things from how they were in the 1960s.