Question

Essentially, if I have

typedef struct {
    int x;
    int y;
} A;

typedef struct {
    int h;
    int k;
} B;

and I have A a, does the C standard guarantee that ((B*)&a)->k is the same as a.y?

Was it helpful?

Solution

Are C-structs with the same members types guaranteed to have the same layout in memory?

Almost yes. Close enough for me.

From n1516, Section 6.5.2.3, paragraph 6:

... if a union contains several structures that share a common initial sequence ..., and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

This means that if you have the following code:

struct a {
    int x;
    int y;
};

struct b {
    int h;
    int k;
};

union {
    struct a a;
    struct b b;
} u;

If you assign to u.a, the standard says that you can read the corresponding values from u.b. It stretches the bounds of plausibility to suggest that struct a and struct b can have different layout, given this requirement. Such a system would be pathological in the extreme.

Remember that the standard also guarantees that:

  • Structures are never trap representations.

  • Addresses of fields in a structure increase (a.x is always before a.y).

  • The offset of the first field is always zero.

However, and this is important!

You rephrased the question,

does the C standard guarantee that ((B*)&a)->k is the same as a.y?

No! And it very explicitly states that they are not the same!

struct a { int x; };
struct b { int x; };
int test(int value)
{
    struct a a;
    a.x = value;
    return ((struct b *) &a)->x;
}

This is an aliasing violation.

OTHER TIPS

Piggybacking on the other replies with a warning about section 6.5.2.3. Apparently there is some debate about the exact wording of anywhere that a declaration of the completed type of the union is visible, and at least GCC doesn't implement it as written. There are a few tangential C WG defect reports here and here with follow-up comments from the committee.

Recently I tried to find out how other compilers (specifically GCC 4.8.2, ICC 14, and clang 3.4) interpret this using the following code from the standard:

// Undefined, result could (realistically) be either -1 or 1
struct t1 { int m; } s1;
struct t2 { int m; } s2;
int f(struct t1 *p1, struct t2 *p2) {
    if (p1->m < 0)
        p2->m = -p2->m;
    return p1->m;
}
int g() {
    union {
        struct t1 s1;
        struct t2 s2;
    } u;
    u.s1.m = -1;
    return f(&u.s1,&u.s2);
}

GCC: -1, clang: -1, ICC: 1 and warns about the aliasing violation

// Global union declaration, result should be 1 according to a literal reading of 6.5.2.3/6
struct t1 { int m; } s1;
struct t2 { int m; } s2;
union u {
    struct t1 s1;
    struct t2 s2;
};
int f(struct t1 *p1, struct t2 *p2) {
    if (p1->m < 0)
        p2->m = -p2->m;
    return p1->m;
}
int g() {
    union u u;
    u.s1.m = -1;
    return f(&u.s1,&u.s2);
}

GCC: -1, clang: -1, ICC: 1 but warns about aliasing violation

// Global union definition, result should be 1 as well.
struct t1 { int m; } s1;
struct t2 { int m; } s2;
union u {
    struct t1 s1;
    struct t2 s2;
} u;
int f(struct t1 *p1, struct t2 *p2) {
    if (p1->m < 0)
        p2->m = -p2->m;
    return p1->m;
}
int g() {
    u.s1.m = -1;
    return f(&u.s1,&u.s2);
}

GCC: -1, clang: -1, ICC: 1, no warning

Of course, without strict aliasing optimizations all three compilers return the expected result every time. Since clang and gcc don't have distinguished results in any of the cases, the only real information comes from ICC's lack of a diagnostic on the last one. This also aligns with the example given by the standards committee in the first defect report mentioned above.

In other words, this aspect of C is a real minefield, and you'll have to be wary that your compiler is doing the right thing even if you follow the standard to the letter. All the worse since it's intuitive that such a pair of structs ought to be compatible in memory.

This sort of aliasing specifically requires a union type. C11 §6.5.2.3/6:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

This example follows:

The following is not a valid fragment (because the union type is not visible within function f):

struct t1 { int m; };
struct t2 { int m; };
int f(struct t1 *p1, struct t2 *p2)
{
    if (p1->m < 0)
          p2->m = -p2->m;
    return p1->m;
}

int g() {
    union {
          struct t1 s1;
          struct t2 s2;
    } u;
    /* ... */
    return f(&u.s1, &u.s2);}
}

The requirements appear to be that 1. the object being aliased is stored inside a union and 2. that the definition of that union type is in scope.

For what it's worth, the corresponding initial-subsequence relationship in C++ does not require a union. And in general, such union dependence would be an extremely pathological behavior for a compiler. If there's some way the existence of a union type could affect a concerete memory model, it's probably better not to try to picture it.

I suppose the intent is that a memory access verifier (think Valgrind on steroids) can check a potential aliasing error against these "strict" rules.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top