Why is writing to a non-const object after casting away const of pointer to that object not UB?

https://stackoverflow.com/questions/8530611

18-03-2021
|

Question

According to the C++ Standard it's okay to cast away const from the pointer and write to the object if the object is not originally const itself. So that this:

 const Type* object = new Type();
 const_cast<Type*>( object )->Modify();

is okay, but this:

 const Type object;
 const_cast<Type*>( &object )->Modify();

is UB.

The reasoning is that when the object itself is const the compiler is allowed to optimize accesses to it, for example, not perform repeated reads because repeated reads make no sense on an object that doesn't change.

The question is how would the compiler know which objects are actually const? For example, I have a function:

void function( const Type* object )
{
    const_cast<Type*>( object )->Modify();
}

and it is compiled into a static lib and the compiler has no idea for which objects it will be called.

Now the calling code can do this:

Type* object = new Type();
function( object );

and it will be fine, or it can do this:

const Type object;
function( &object );

and it will be undefined behavior.

How is compiler supposed to adhere to such requirements? How is it supposed to make the former work without making the latter work?

Solution

When you say "How it is supposed to make the former work without making the latter work?" an implementation is only required to make the former work, it needn't - unless it wants to help the programmer - make any extra effort in trying to make the latter not work in some particular way. The undefined behavior gives a freedom to the implementation, not an obligation.

Take a more concrete example. In this example, in f() the compiler may set up the return value to be 10 before it calls EvilMutate because cobj.member is const once cobj's constructor is complete and may not subsequently be written to. It cannot make the same assumption in g() even if only a const function is called. If EvilMutate attempts to mutate member when called on cobj in f() undefined behavior occurs and the implementation need not make any subsequent actions have any particular effect.

The compiler's ability to assume that a genuinely const object won't change is protected by the fact that doing so would cause undefined behavior; the fact that it does, doesn't impose additional requirements on the compiler, only on the programmer.

struct Type {
    int member;
    void Mutate();
    void EvilMutate() const;
    Type() : member(10) {}
};


int f()
{
    const Type cobj;
    cobj.EvilMutate();
    return cobj.member; 
}

int g()
{
     Type obj;
     obj.EvilMutate();
     return obj.member; 
}

OTHER TIPS

The compiler can perform optimization only on const objects, not on references/pointers to const objects (see this question). In your example, there is no way the compiler can optimize function, but he can optimize the code using a const Type. Since this object is assumed by the compiler to be constant, modifying it (by calling function) can do anything, including crashing your program (for example if the object is stored in read-only memory) or working like the non-const version (if the modification does not interfere with the optimizations)

The non-const version has no problem and is perfectly defined, you just modify a non-const object so everything is fine.

If an object is declared const, an implementation is allowed to store it in such a way that attempts to modify it could cause hardware traps, without having any obligation to ensure any particular behavior for those traps. If one constructs a const pointer to such an object, recipients of that pointer will not generally be allowed to write it, and would thus be in no danger of triggering those hardware traps. If code casts away the const-ness and writes to the pointer, a compiler would be under no obligation to protect the programmer against any hardware oddities that might occur.

Further, in the event that a compiler can tell that a const object is always going to contain a particular sequence of bytes, it could inform the linker of that, and allow the linker to see if that sequence of bytes occurs anywhere in the code and, if so, regard the address of the const object as being the location of that sequence of bytes (complying with various restrictions about different objects having unique addresses might be a little tricky, but it would be permissible). If the compiler told the linker that a const char[4] was always supposed to contain a sequence of bytes that happened to appear within the compiled code for some function, a linker could assign to that variable the address within the code where that byte sequence appears. If the const was never written, such behavior would save four bytes, but writing to the const would arbitrarily change the meaning of the other code.

If writing to an object after casting away const was always UB, the ability to cast away const-ness wouldn't be very useful. As it is, the ability often plays a role in situations where a piece of code holds onto pointers--some of which are const and some of which will need to be written--for the benefit of other code. If casting away the const-ness of const pointers to non-const objects weren't defined behavior, the code which is holding the pointers would need to know which pointers are const and which ones will need to be written. Because const-casting is allowed, however, it is sufficient for the code holding the pointers to declare them all as const, and for code which knows that a pointer identifies a non-const object and wants to write it, to cast it to a non-cast pointer.

It might be helpful if C++ had forms of const (and volatile) qualifiers which could be used on pointers to instruct the compiler that it may (or, in the case of volatile, should) regard the pointer as identifying a const and/or volatile object even if the compiler knows that the object is, and knows that it isn't const and/or isn't declared volatile. The former would allow a compiler to assume that the object identified by a pointer wouldn't change during a pointer's lifetime, and cache data based upon that; the latter would allow for cases where a variable may need to support volatile accesses in some rare situations (typically at program startup) but where the compiler should be able to cache its value after that. I know of no proposals to add such features, though.

Undefined behavior means undefined behavior. The specification makes no guarantees what will happen.

That doesn't mean it won't do what you intend. Just that you're outside of the boundary of behavior that the specification states should work. The specification is there to say what will happen when you do certain things. Outside of the protection of the spec, all bets are off.

But just because you're off the edge of the map does not mean that you will encounter a dragon. Maybe it'll be a fluffy bunny.

Think of it like this:

class BaseClass {};
class Derived : public BaseClass {};

BaseClass *pDerived = new Derived();
BaseClass *pBase = new Base();

Derived *pLegal = static_cast<Derived*>(pDerived);
Derived *pIllegal = static_cast<Derived*>(pBase);

C++ defines one of these casts to be perfectly valid. The other yields undefined behavior. Does that mean that a C++ compiler actually checks the type and flips the "undefined behavior" switch? No.

It means is that the C++ compiler will more than likely assume that pBase is actually a Derived and therefore perform the pointer arithmetic needed to convert the pBase into a Derived*. If it isn't actually a Derived, then you get undefined results.

That pointer arithmetic may in fact be a no-op; it may do nothing. Or it may actually do something. It doesn't matter; you are now outside of the realm of behavior defined by the specification. If the pointer arithmetic is a no-op, then everything may appear to work perfectly.

It's not that the compiler "knows" that in one instance it's undefined and in another it's defined. It's that the specification does not say what will happen. It may appear to work. It may not. The only times that it will work are when it is done properly in accord with the specification.

The same goes for const casts. If the const cast is from an object that was not originally const, then the spec says that it will work. If it's not, then the spec says that anything can happen.

In theory, const objects are allowed to be stored in read-only memory in some cases, which would cause obvious problems if you try to modify the object, but a more likely case is that if at any point the definition of the object is visible, so that the compiler can actually see that the object is defined as const, the compiler can optimise based on the assumption that members of that object do not change. If you call a non-const function on a const object to set a member, and then read that member, the compiler could bypass the read of that member if it already knows the value. After all, you defined the object as const: you promised that that value wouldn't change.

Undefined behaviour is tricky in that it often seems to work as you expect, until you make one slight modification.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow