Question

Just playing around with casting. Assume, we have 2 classes

public class Base
{
    public int a;
}

public class Inh : Base
{
    public int b;
}

Instantiate both of them

        Base b1 = new Base {a = 1};
        Inh i1 = new Inh {a = 2, b = 2};

Now, lets try upcast

        // Upcast
        Base b2 = i1;

Seems that b2 is still holding field b, which is presented only in Inh class. Lets check it by downcasting.

        // Downcast
        var b3 = b2;
        var i2 = b2 as Inh;
        var i3 = b3 as Inh;

        bool check = (i2 == i3);

Check is true here (i guess, because i2 and i3 are referencing to the same instance i1). Ok, lets see, how they would be stored in array.

        var list = new List<Base>();

        list.Add(new Base {a = 5});
        list.Add(new Inh {a = 10, b = 5});

        int sum = 0;
        foreach (var item in list)
        {
            sum += item.a;
        }

Everything is okay, as sum is 15. But when i'm trying to serialize array by using XmlSerializer (just to see, what's inside), it returns InvalidOperationException "The type ConsoleApplication1.Inh was not expected". Well, fair enough, because its array of Bases.

So, what actually b2 is? Can i serialize an array of Bases and Inhs? Can i get Inhs fields by downcasting items from deserialized array?

Was it helpful?

Solution 2

Actually, the question is about what happens in memory

So; not serialization, then. K.

Let's take it from the top, then:

public class Base
{
    public int a;
}

public class Inh : Base
{
    public int b;
}

Here we have two reference types (classes); the fact that they are reference-type is very important, because that directly influences what is actually stored in arrays / variables.

Base b1 = new Base {a = 1};
Inh i1 = new Inh {a = 2, b = 2};

Here we create 2 objects; one of type Base, and one of type Inh. The reference to each object is stored in b1 / i1 respectively. I've italicized the word reference for a reason: it is not the object that is there. The object is somewhere arbitrary on the managed heap. Essentially b1 and i1 are just holding the memory address to the actual object. Side note: there are minor technical differences between "reference", "address" and "pointer", but they serve the same purpose here.

Base b2 = i1;

This copies the reference, and assigns that reference to b2. Note that we haven't copied the object. We still only have 2 objects. All we have copied is the number that happens to represent a memory address.

var b3 = b2;
var i2 = b2 as Inh;
var i3 = b3 as Inh;
bool check = (i2 == i3);

Here we do the same thing in reverse.

var list = new List<Base>();

list.Add(new Base {a = 5});
list.Add(new Inh {a = 10, b = 5});

int sum = 0;
foreach (var item in list)
{
    sum += item.a;
}

The list here is a list of references. The objects are still somewhere arbitrary on the managed heap. So yes, we can iterate through them. Because all Inh are also Base, there is no issue whatsoever here. So finally, we get to the question (from comments(:

Then, another question (more detailed): how Inh would be stored in array of Bases? Would b be just dropped?

Absolutely not. Because they are reference-types, the list never actually contains and Inh or Base objects - it only contains the reference. The reference is just a number - 120934813940 for example. A memory address, basically. It doesn't matter at all whether we think 120934813940 points to a Base or an Inh - our talking about it in either terms doesn't impact the actual object located at 120934813940. All we need to do is perform a cast, which means: instead of thinking of 120934813940 as a Base, think of it as an Inh - which involves a type-test to confirm that it is what we suspect. For example:

int sum = 0;
foreach (var item in list)
{
    sum += item.a;
    if(item is Inh)
    {
       Inh inh = (Inh)item;
       Console.WriteLine(inh.b);
    }
}

So b was there all the time! The only reason we couldn't see it is that we only assumed that item was a Base. To get access to b we need to cast the value. There are three important operations commonly used here:

  • obj is Foo - performs a type test returning true if the value is non-null and is trivially assignable as that type, else false
  • obj as Foo - performs a type test, returning the reference typed as Foo if it is non-null and is a match, or null otherwise
  • (Foo)obj - performs a type test, returning null if it is null, the reference typed as Foo if it is a match, or throws an exception otherwise

So that loop could also be written as:

int sum = 0;
foreach (var item in list)
{
    sum += item.a;
    Inh inh = item as Inh;
    if(inh != null)
    {
       Console.WriteLine(inh.b);
    }
}

OTHER TIPS

If you want it to work with serialization, you'll need to tell the serializer about the inheritance. In the case of XmlSerializer, this is:

[XmlInclude(typeof(Inh))]
public class Base
{
    public int a;
}

public class Inh : Base
{
    public int b;
}

Then the following works fine:

var list = new List<Base>();

list.Add(new Base { a = 5 });
list.Add(new Inh { a = 10, b = 5 });

var ser = new XmlSerializer(list.GetType());
var sb = new StringBuilder();
using (var xw = XmlWriter.Create(sb))
{
    ser.Serialize(xw, list);
}
string xml = sb.ToString();
Console.WriteLine(xml);
using (var xr = XmlReader.Create(new StringReader(xml)))
{
    var clone = (List<Base>)ser.Deserialize(xr);
}

with clone having the expected 2 objects of different types. The xml is (reformatted for readability):

<?xml version="1.0" encoding="utf-16"?><ArrayOfBase
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Base><a>5</a></Base>
    <Base xsi:type="Inh"><a>10</a><b>5</b></Base>
</ArrayOfBase>

To clarify what actually happens when you cast from one type to another, it may be helpful to mention some information about how instances of reference types are stored in the CLR.

First of all, there are value types (structs).

  • they are stored on the stack (well, it may be an "implementation detail", but IMHO we can safely assume it's the way things are),
  • they don't support inheritance (no virtual methods),
  • instances of value types contain only the values of their fields.

This means all methods and properties in a struct are basically static methods with this struct reference being passed as a parameter implicitly (again, there are one or two exceptions, like ToString, but mostly irrelevant).

So, when you do this:

struct SomeStruct 
{
    public int Value;
    public void DoSomething()
    {
        Console.WriteLine(this.Value);
    }
}

SomeStruct c; // this is placed on stack
c.DoSomething();

It will be logically the same as having a static method and passing the reference to the SomeStruct instance (the reference part is important because it allows the method to mutate the struct contents by writing to that stack memory area directly, without the need to box it):

struct SomeStruct 
{
    public int Value;
    public static void DoSomething(ref SomeStruct instance)
    {
        Console.WriteLine(instance.Value);
    }
}

SomeStruct c; // this is placed on stack
SomeStruct.DoSomething(ref c); // this passes a pointer to the stack and jumps to the method call

If you called DoSomething on a struct, there doesn't exist a different (overriden) method which may have to be invoked, and the compiler knows the actual function statically.

Reference types (classes) are a bit more complex.

  • instances of reference types are stored on the heap, and all variables or fields of a certain reference type merely hold a reference to the object on the heap. Assigning a value of a variable to another, as well as casting, simply copies the reference around, leaving the instance unchanged.
  • they support inheritance (virtual methods)
  • instances of reference types contain values of their fields, and some additional luggage related to GC, Synchronization, AppDomain identity and Type.

If a class method is non-virtual, then it basically behaves like a struct method: it's known at compile time and it's not going to change, so compiler can emit a direct function call passing the object reference just like it did with a struct.

So, what happens when you cast to a different type? As far as the memory layout is concerned, nothing much.

If you have your object defined like you mentioned:

public class Base
{
    public int a;
}

public class Inh : Base
{
    public int b;
}

And you instantiate an Inh, and then cast it to a Base:

Inh i1 = new Inh() { a = 2, b = 5 };
Base b2 = i1;    

The heap memory will contain a single object instance (at, say, address 0x20000000):

// simplified memory layout of an `Inh` instance
[0x20000000]: Some synchronization stuff
[0x20000004]: Pointer to RTTI (runtime type info) for Inh
[0x20000008]: Int32 field (a = 2)
[0x2000000C]: Int32 field (b = 5)

Now, all variables of a reference type point to the location of the RTTI pointer (the actual object's memory area starts 4 bytes earlier, but that's not so important).

Both i1 and b2 contain a single pointer (0x20000004 in this example), and the only difference is that compiler will allow a Base variable to reference only the first field in that memory area (the a field), with no way to go further through the instance.

For the Inh instance i1, that same field is located at exactly the same offset, but it also has access to the next field b located 4 bytes after the first one (at 8 byte offset from the RTTI pointer).

So if you write this:

Console.WriteLine(i1.a);
Console.WriteLine(b2.a);

Compiled code will in both cases be the same (simplified, no type checks, just addressing):

  1. For i1:

    a. Get the address of i1 (0x20000004)

    b. Add offset of 4 bytes to get the address of a (0x20000008)

    c. Fetch the value at that address (2)

  2. For b2:

    a. Get the address of b2 (0x20000004)

    b. Add offset of 4 bytes to get the address of a (0x20000008)

    c. Fetch the value at that address (2)

So, the one and only instance of Inh is in memory, unmodified, and by doing a cast you are simply telling the compiler how to represent the data found at that memory location. Compared with plain C, C# will fail at runtime if you try to cast to an object which is not in the inheritance hierarchy, but a plain C program would happily return whatever is at the known fixed offset of a certain field in your instance. The only difference is that C# checks if what you are doing makes sense, but the type of the variable otherwise serves only to allow walking around the same object instance.

You can even cast it to an Object:

Object o1 = i1; // <-- this still points to `0x20000004`    
// Hm. Ok, that worked, but now what? 

Again, the memory instance is unmodified, but there is nothing much you can do with a variable of Object, except downcast it again.

Virtual methods are even more interesting, because they involve the compiler jumping through the mentioned RTTI pointer to get to the virtual method table for that type (allowing a type to override methods of a base type). This again means that the compiler will simply use the fixed offset for a particular method, but the actual instance of the derived type will have the appropriate method implementation at that location in the table.

b2 is an Inh, but to the compiler it is a Base because you declared it as such.

Still, if you do (b2 as Inh).b = 2, it will work. The compiler then knows to treat it as an Inh and the CLR knows it's really an Inh already.

As Marc pointed out, if you use XML Serialization you will need to decorate the base class with a declaration per inheriting type.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top