The standard places no constraints on this. A compiler writer with a really twisted mind could, for example, generate a loop which does nothing at the start of every function, with the number of times through the loop depending on the number of letters in the function name. Fully conforming, but... I rather doubt he'd have many users for his compiler.
In practice, it's just (barely) conceivable that the compiler work out the address of each sub-object; e.g. on an Intel, do something like:
D::direct:
mov eax, [ecx + offset m_D]
return
D::ind1:
lea ebx, [ecx + offest instance_A]
mov eax, [ebx + offset m_D]
return
D::ind2:
lea ebx, [ecx + offset instance_B]
lea ebx, [ebx + offset m_A_of_B]
mov eax, [ebx + offset m_D]
return
In fact, all of the compilers I've ever seen work out the complete layout of the directly contained objects, and would generate something like:
D::direct:
mov eax, [ecx + offset m_D]
return
D::ind1:
mov eax, [ecx + offset instance_A + offset m_D]
return
D::ind2:
mov eax, [ecx + offset instance_A + offset m_A_of_B + offset m_D]
return
(The additions of the offsets in the square brackets occurs in the assembler; the expressions correspond to a single constant within the instruction in the actual executable.)
So in answser to your questions: 1 is that it's completely compiler-dependent, and 2 is that in actual practice, there will be absolutely no difference.
Finally, all of your functions are inline. And they are simple enough that every compiler will inline them, at least with any degree of optimization activated. And once inlined, the optimizer may find additional optimizations: it may be able to detect that you initialized D::instance_B::m_A_of_B::m_A with a constant, for example; in which case, it will just use the constant, and there won't be any access what so ever. In fact, you're wrong to worry about this level of optimization, because the compiler will take care of it for you, better than you can.