How runtime knows the exact type of a boxed value type?

https://stackoverflow.com//questions/25035332

21-12-2019
|

Question

I understand what boxing is. A value type is boxed to an object/reference type and is then stored on managed heap as an object. But I can't get thru unboxing.

Unboxing converts your object/reference type back to the value type

int i = 123;          // A value type
object box = i;       // Boxing
int j = (int)box;     // Unboxing

Alright. But if I try to unbox a value type into another value type, for example, long in above example, it throws InvalidCastException

long d = (long)box;

It leaves me with an idea that may be runtime implicitly knows the actual TYPE of value type boxed inside "box" object. If I am right, I wonder where this type information is stored.

EDIT:

Since int is implicitly convertible to long. This is what confusing me.

int i = 123;
long lng = i;

is perfectly fine because it has no boxing/unboxing involved.

Solution

When a value is boxed it gets an object header. The kind that any type that derives from System.Object has. The value follows that header. The header contains two fields, one is the "syncblk", it has various uses that are beyond the scope of the question. The second field describes the type of object.

That's the one you are asking about. It has various names in literature, most commonly "type handle" or "method table pointer". The latter is the most accurate description, it is a pointer to the info the CLR keeps track of whenever it loads a type. Lots of framework features depend on it. Object.GetType() of course. Any cast in your code as well as the is and as operators use it. These casts are safe so you can't turn a Dog into a Cat, the type handle provides this guarantee. The method table pointer for your boxed int points to the method table for System.Int32

Boxing was very common in .NET 1.x, before generics became available. All of the common collection types stored object instead of T. So putting an element in the collection required (implicit) boxing, getting it out again required explicit unboxing with a cast.

To make this efficient, it was pretty important that the jitter didn't need to consider the possibility that a conversion would be required. Because that requires a lot more work. So the C# language included the rule that unboxing to another type is illegal. All that's needed now is a check on the type handle to ensure it is expected type. The jitter directly compares the method table pointer to the one for System.Int32 in your case. And the value embedded in the object can be copied directly without any conversion concerns. Pretty fast, as fast as it can possibly be, this can all be done with inline machine code without any CLR call.

This rule is specific to C#, VB.NET doesn't have it. Typical trade-off between those two languages, C#'s focus is on speed, VB.NET on convenience. Converting to another type when unboxing isn't otherwise a problem, all simple value types implement IConvertible. You write it explicit in your code, using the Convert helper class:

        int i = 123;                    // A value type
        object box = i;                 // Boxing
        long j = Convert.ToInt64(box);  // Conversion + unboxing

Which is pretty similar to the code that the VB.NET compiler auto-generates.

OTHER TIPS

It's because boxing instruction adds value type token into result object MSDN. When you are unboxing value from object, this variable is known type (and size in memory). Therefore you must cast object to original value type.

In your example you even don't need to cast it from int to long, because it's an implicit cast.

It is because when you do boxing instead of moving the value type from stack to heap, it creates a copy of it in heap and stores the reference of it in stack in a new stack box. So your original stack object i.e. value type object along with its data type information remains in the stack and maintains its history. Now at the time of unboxing, it compares the type of object from heap to original data type in stack and if it finds mismatch gives the error. So, it is necessary to use same data type that you boxed while doing unboxing.

Every reference object has a bunch of metadata associated with it. This includes the exact type of the given object (which is why you can have type safety at all).

So while the int is by-value, this information is actually missing (not that it matters), but once you box it, it creates a new object with all the necessary metadata. This also means that while an int is just 4 bytes, a boxed int is much more than that - you've got a reference now (4-8 bytes), the value itself (4) and the metadata (which includes the specific type handle). This is very different from e.g. C++, which allows you to cast any pointer to a pointer of any type (and leaving you to deal with the errors when you cast it wrong).

Again, all the by-reference objects have this metadata. This is quite an important part of the cost of reference types, but it is also the means by which you can be sure of the type safety. This also nicely shows how expensive ArrayList of int can really be, and why int[] or List<int> is much more efficient - even ignoring the costs of allocating (and more importantly collecting) heap objects and the boxing and unboxing itself, the 4 byte int could suddenly be 20 bytes, just because you're storing a reference to it :)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow