Question

I can't seem to wrap my head around certain parts of the C standard, so I'm coming here to clear up that foggy, anxious uncertainty that comes when I have to think about what such tricks are defined behaviour and what are undefined or violate the standard. I don't care whether or not it will WORK, I care if the C standard considers it legal, defined behaviour.

Such as this, which I am fairly certain is UB:

struct One
{
        int Hurr;
        char Durr[2];
        float Nrrr;
} One;

struct Two
{
        int Hurr;
        char Durr[2];
        float Nrrr;
        double Wibble;
} Two;

One = *(struct One*)&Two;

This is not all I am talking about. Such as casting the pointer to One to int*, and dereferencing it, etc. I want to get a good understanding of what such things are defined so I can sleep at night. Cite places in the standard if you can, but be sure to specify whether it's C89 or C99. C11 is too new to be trusted with such questions IMHO.

Was it helpful?

Solution 2

C99 6.7.2.1 says:

Para 5

As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence

Para 12

Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.

Para 13

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning

That last paragraph covers your second question (casting the pointer to One to int*, and dereferencing it).

The first point - whether it is valid to "Downcast" a Two* to a One* - I could not find specifically addressed. It boils down to whether the other rules ensure that the memory layout of the fields of One and the initial fields of Two are identical in all cases.

The members have to be packed in ordered sequence, no padding is allowed at the beginning, and they have to be aligned according to type, but the standard does not actually say that the layout needs to be the same (even though in most compilers I am sure it is).

There is, however, a better way to define these structures so that you can guarantee it:

struct One
{
        int Hurr;
        char Durr[2];
        float Nrrr;
} One;

struct Two
{
        struct One one;
        double Wibble;
} Two;

You might think you can now safely cast a Two* to a One* - Para 13 says so. However strict aliasing might bite you somewhere unpleasant. But with the example above you don't need to anyway:

One = Two.one;

OTHER TIPS

I think that technically that example is UB, too. But it will almost certainly work, and neither gcc nor clang complain about it with -pedantic.

To start with, the following is well-defined in C99 (§6.5.2.3/6): [1]

union OneTwo {
  struct One one;
  struct Two two;
};

OneTwo tmp = {.two = {3, {'a', 'b'}, 3.14f, 3.14159} };
One one = tmp.one;

The fact that accessing the "punned" struct One through union must work implies that the layout of the prefix of struct Two is identical to struct One. This cannot be contingent on the existence of a union because the a given composite type can only have one storage layout, and its layout cannot be contingent on its use in a union because the union does not need to be visible to every translation unit in which the struct is used.

Furthermore, in C all types are no more than a sequence of bytes (unlike, for example, C++) (§6.2.6.1/4) [2]. Consequently, the following is also guaranteed to work:

struct One one;
struct Two two = ...;
unsigned char tmp[sizeof one];
memcpy(tmp, two, sizeof one);
memcpy(one, tmp, sizeof one);

Given the above and the convertibility of any pointer type to a void*, I think it is reasonable to conclude that the temporary storage above is unnecessary, and it could have been written directly as:

struct One one;
struct Two two = ...;
unsigned char tmp[sizeof one];
memcpy(one, two, sizeof one);

From there to the direct assignment through an aliased pointer as in the OP is not a very big leap, but there is an additional problem for the aliased pointer: it is theoretically possible for the pointer conversion to create an invalid pointer, because it's possible that the bit format of a struct Two* differs from a struct One*. Although it is legal to cast one pointer type to another pointer type with looser alignment (§6.3.2.3/7) [3] and then convert it back again, it is not guaranteed that the converted pointer is actually usable, unless the conversion is to a character type. In particular, it is possible that the alignment of struct Two is different from (more strict than) the alignment of struct One, and that the bit format of the more strongly-aligned pointer is not directly usable as a pointer to the less strongly-aligned struct. However, it is hard to see an argument against the almost equivalent:

one = *(struct One*)(void*)&two;

although this may not be explicitly guaranteed by the standard.

In comments, various people have raised the spectre of aliasing optimizations. The above discussion does not touch on aliasing at all because I believe that it is irrelevant to a simple assignment. The assignment must be sequenced after any preceding expressions and before any succeeding ones; it clearly modifies one and almost as clearly references two. An optimization which made a preceding legal mutation of two invisible to the assignment, would be highly suspect.

But aliasing optimizations are, in general, possible. Consequently, even though all of the above pointer casts should be acceptable in the context of a single assignment expression, it would certainly not be legal behaviour to retain the converted pointer of type struct One* which actually points into an object of type struct Two and expect it to be usable either to mutate a member of its target or to access a member of its target which has otherwise been mutated. The only context in which you could get away with using a pointer to struct One as though it were a pointer to the prefix of struct Two is when the two objects are overlaid in a union.

--- Standard references:

[1] "if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible."

[2] "Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy)…"

[3] "A pointer to an object type may be converted to a pointer to a different object type… When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object."

A1. Undefined behaviour, because of Wibble. A2. Defined.

S9.2 in N3337.

Two standard-layout struct (Clause 9) types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible types

Your structs would be layout compatible and thus interchangeable but for Wibble. There is a good reason too: Wibble might cause different padding in struct Two.

A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.

I think that guarantees that you can dereference the initial int.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top