Does unboxing just return a pointer to the value within the boxed object on the heap?

https://stackoverflow.com/questions/2978647

24-10-2019
|

Question

I this MSDN Magazine article, the author states (emphasis mine):

Note that boxing always creates a new object and copies the unboxed value's bits to the object. On the other hand, unboxing simply returns a pointer to the data within a boxed object: no memory copy occurs. However, it is commonly the case that your code will cause the data pointed to by the unboxed reference to be copied anyway.

I'm confused by the sentence I've bolded and the sentence that follows it. From everything else I've read, including this MSDN page, I've never before heard that unboxing just returns a pointer to the value on the heap. I was under the impression that unboxing would result in you having a variable containing a copy of the value on the stack, just as you began with. After all, if my variable contains "a pointer to the value on the heap", then I haven't got a value type, I've got a pointer.

Can someone explain what this means? Was the author on crack? (There is at least one other glaring error in the article). And if this is true, what are the cases where "your code will cause the data pointed to by the unboxed reference to be copied anyway"?

I just noticed that the article is nearly 10 years old, so maybe this is something that changed very early on in the life of .Net.

Solution

The article is accurate. It however talks about what really goes on, not what the IL looks like that the compiler generates. After all, a .NET program never executes IL, it executes the machine code that's generated from the IL by the JIT compiler.

And the unbox opcode indeed generates code that produces a pointer to the bits on the heap that represents the value type value. The JIT generates a call to a small helper function in the CLR named "JIT_Unbox". clr\src\vm\jithelpers.cpp if you got the SSCLI20 source code. The Object::GetData() function returns the pointer.

From there, the value most commonly first gets copied into a CPU register. Which then may get stored somewhere. It doesn't have to be the stack, it could be a member of a reference type object (the gc heap). Or a static variable (the loader heap). Or it could be pushed on the stack (method call). Or the CPU register could be used as-is when the value is used in an expression.

While debugging, right-click the editor window and choose "Go To Disassembly" to see the machine code.

OTHER TIPS

The author of the original article must have been referring to what's happening down at the IL level. There exist two unboxing opcodes: unbox and unbox.any.

According to MSDN, regarding unbox.any:

When applied to the boxed form of a value type, the unbox.any instruction extracts the value contained within obj (of type O), and is therefore equivalent to unbox followed by ldobj.

and regarding unbox:

[...] unbox is not required to copy the value type from the object. Typically it simply computes the address of the value type that is already present inside of the boxed object.

So, the author knew what he was talking about.

This little fact about unbox makes it possible to do certain nifty optimizations when working directly with IL. For example, if you have a boxed int which you need to pass to a function accepting a ref int, you can just emit an unbox opcode, and the reference to the int will be ready in the stack for the function to operate upon. In this case the function will change the actual contents of the boxing object, something which is quite impossible at the C# level. It saves you from the need to allocate space for a temporary local variable, unbox the int in there, pass a ref to the int to the function, and then create a new boxing object to re-box the int, discarding the old box.

Of course, when you are working at the C# level, you cannot do any such optimizations, so what will usually be happening is that the code generated by the compiler will almost always be copying the variable from the boxed object prior to making any further use of it.

Boxing is the act of casting a value-type instance to a reference-type instance (either an object or an interface), and reference types are allocated on the heap.

According to 'C# 4.0 in a Nutshell': " ...unboxing copies the contents of the object back into a value-type instance" and that implies on the stack.

In the article you reference, the author states:

public static void Main() {

   Int32 v = 5;    // Create an unboxed value type variable
   Object o = v;   // o refers to a boxed version of v
   v = 123;        // Changes the unboxed value to 123

   Console.WriteLine(v + ", " + (Int32) o);    // Displays "123, 5"
}

From this code, can you guess how many boxing operations occur? You might be surprised to discover that the answer is three! Let's analyze the code carefully to really understand what's going on. First, an Int32 unboxed value type (v) is created and initialized to 5. Then an Object reference type (o) is created and it wants to point to v. But reference types must always point to objects in the heap, so C# generated the proper IL code to box v and stored the address of the boxed version of v in o. Now 123 is unboxed and the referenced data is copied into the unboxed value type v; this has no effect on the boxed version of v, so the boxed version keeps its value of 5. Note that this example shows how o is unboxed (which returns a pointer to the data in o), and then the data in o is memory copied to the unboxed value type v.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow