If Derived adds no new members to Base (and is POD), then what kind of pointer casts, and dereferencing, can be safely done?

StackOverflow https://stackoverflow.com/questions/19741843

문제

(This is another question about undefined behaviour (UB). If this code 'works' on some compiler, then that means nothing in the land of UB. That is understood. But exactly at what line below do we cross into UB?)

(There are a number of very similar questions on SO already, e.g. (1) but I'm curious what can be safely done with the pointers before dereferencing them.)

Start off with a very simple Base class. No virtual methods. No inheritance. (Maybe this can be extended to anything that's POD?)

struct Base {
        int first;
        double second;
};

And then a simple extension that adds (non-virtual) methods and doesn't add any members. No virtual inheritance.

struct Derived : public Base {
        int foo() { return first; }
        int bar() { return second; }
};

Then, consider the following lines. If there is some deviation from defined behaviour, I'd be curious to know which lines exactly. My guess is that we can safely perform much of the calculations on the pointers. Is it possible that some of these pointer calculations, if not fully defined, at least give us some sort of 'indeterminate/unspecified/implementation-defined' value that isn't entirely useless?

void foo () {
    Base b;
    void * vp = &b;     // (1) Defined behaviour?
    cout << vp << endl; // (2) I hope this isn't a 'trap value'
    cout << &b << endl; // (3a) Prints the same as the last line?
                        // (3b) It has the 'same value' in some sense?
    Derived *dp = (Derived*)(vp);
                        // (4) Maybe this is an 'indeterminate value',
                        // but not fully UB?
    cout << dp << endl; // (5)  Defined behaviour also?  Should print the same value as &b

Edit: If the program ended here, would it be UB? Note that, at this stage, I have not attempted to do anything with dp, other than print the pointer itself to the output. If simply casting is UB, then I guess the question ends here.

                        // I hope the dp pointer still has a value,
                        // even if we can't dereference it
    if(dp == &b) {      // (6) True?
            cout << "They have the same value. (Whatever that means!)" << endl;
    }

    cout << &(b.second) << endl; (7) this is definitely OK
    cout << &(dp->second) << endl; // (8)  Just taking the address. Is this OK?
    if( &(dp->second) == &(b.second) ) {      // (9) True?
            cout << "The members are stored in the same place?" << endl;
    }
}

I'm slightly nervous about (4) above. But I assume that it's always safe to cast to and from void pointers. Maybe the value of such a pointer can be discussed. But, is it defined to do the cast, and to print the pointer to cout?

(6) is important also. Will this evaluate to true?

In (8), we have the first time this pointer is being dereferenced (correct term?). But note that this line doesn't read from dp->second. It's still just an lvalue and we take its address. This calculation of the address is, I assume, defined by simple pointer arithmetic rules that we have from the C language?

If all of the above is OK, maybe we can prove that static_cast<Derived&>(b) is OK, and will lead to a perfectly usable object.

도움이 되었습니까?

해결책

  1. Casting from a data pointer to void * is always guaranteed to work, and the pointer is guaranteed to survive the roundtrip Base * -> void * -> Base * (C++11 §5.2.9 ¶13);
  2. vp is a valid pointer, so there shouldn't be any problem.
  3. a. albeit printing pointers is implementation-defined1, the printed values should be the same: in facts operator<< by default is overloaded only for const void *, so when you write cout<<&b you are converting to const void * anyway, i.e. what operator<< sees is in both cases &b casted to const void *.

    b. yes, if we take the only sensible definition of "has the same value" - i.e. it compares equal with the == operator; in facts, if you compare vp and &b with ==, the result is true, both if you convert vp to Base * (due to what we said in 1), and if you convert &b to void *.

    Both these conclusions follow from §4.10 ¶2, where it's specified that any pointer can be converted to void * (modulo the usual cv-qualified stuff), and the result «points to the start of the storage location where the object [...] resides»1

  4. This is tricky; that C-style cast is equivalent to a static_cast, which will happily allow casting a «"pointer to cv1 B[...] to [...] "pointer to *cv2 D", where D is a class derived from B» (§5.2.9, ¶11; there are some additional constraints, but they are satisfied here); but:

    If the prvalue of type “pointer to cv1 B” points to a B that is actually a subobject of an object of type D, the resulting pointer points to the enclosing object of type D. Otherwise, the result of the cast is undefined.

    (emphasis added)

    So, here your cast is allowed, but the result is undefined...

  5. ... which leads us to printing its value; since the result of the cast is undefined, you may get anything. Since pointers are probably allowed to have trap representations (at least in C99, I could find only sparse references to "traps" in the C++11 standard, but I think that probably this behavior should already be inherited from C89) you may even get a crash just by reading this pointer to print it via operator<<.

If follows that also 6, 8 and 9 aren't meaningful, because you are using an undefined result.

Also, even if the cast was valid, strict aliasing (§3.10, ¶10) would block you to do anything meaningful with the pointed objects, since aliasing a Base object via a Derived pointer is only allowed when the dynamic type of the Base object is actually Derived; anything that deviates from the exceptions specified at §3.10 ¶10 results in undefined behavior.


Notes:

  1. operator>> delegates to num_put which conceptually delegates to printf with %p, whose description boils down to "implementation defined".

  2. This rules out my fear that an evil implementation could in theory return different but equivalent values when casting to void *.

다른 팁

(Attempting to answer my own question, from the point of view of strict aliasing. A good optimizer is entitled to do some unexpected things, which effectively give us UB. But I'm not an expert, by any means!)

In this function,

 void foo(Base &b_ref) {
     Base b;
     ....
 }

it is obvious that b and b_ref can't refer to each other. This particular example doesn't involve analysis of compatible types, it's just the simple observation that a new-constructed local variable is guaranteed to be the only reference to itself. This allows the optimizer to do some tricks. It can store b in a register and it can then execute code, such as b_ref.modify(), that modifies b_ref, safe in the knowledge that b is not affected. (Perhaps only really smart optimizers will notice this, but it is allowed.)

Next, consider this,

void foo(Base &b_ref, Derived&d_ref);

Within the implementation of this function, the optimize cannot assume that b_ref and d_ref refer to different objects. Therefore, if the code calls d_ref.modify(), then the next time the code is accessing b_ref it must look again at the memory that stores the b_ref object. If there is a copy of the b_ref data in the CPU registers, then it is possibly out-of-date data.

But if the types having nothing to do with each other, then such optimizations would be allowed. e.g.

struct Base1 { int i; };  struct Base2 { int i; };
void foo(Base1 & b1_ref, Base2 &b2_ref);

These can be assumed to point to different objects and therefore the compiler is allowed to make certain assumptions. b2_ref.i=5; cannot change b1_ref.i, therefore the compiler can make some assumptions. (Actually, there might be other threads, or even POSIX signals, making changes behind the scenes, and I must admit I'm not to clear on threads!)

So, there are assumptions the compiler is allowed to make for optimization. Consider this:

Base b; // a global variable
void foo() {
    Derived &d_ref = some_function();
    int x1 = b.i;
    d_ref.i = 5;
    int x2 = b.i;
}

With this, the optimizer knows the dynamic type of b, it's a Base. The two consecutive calls to b.i should give the same value (excepting other threads or whatever), and therefore the compiler is allowed to optimize the latter to int x2 = x1. If some_function returned a Base&, i.e. Base &d_ref = some_function(); then the compiler would not be allowed to make such an optimization.

So, given an object where the compiler knows its dynamic type is Base, and a reference to a derived type Derived&, the compiler is entitled to assume they refer to different objects. The compiler is allowed to rewrite the code a little, making the assumption that the two objects don't refer to each other. This can, at the very least, lead to unpredictable behaviour. And anything you do that violates assumptions the optimizer is allowed to make is undefined behaviour.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top