Question

For example, is this code valid, or does it invoke undefined behavior by violating the aliasing rules?

int x;
struct s { int i; } y;
x = 1;
y = *(struct s *)&x;
printf("%d\n", y.i);

My interest is in using a technique based on this to develop a portable method for performing aliased reads.

Update: here is the intended usage case, a little bit different, but it should be valid if and only if the above is valid:

static inline uint32_t read32(const unsigned char *p)
{
    struct a { char r[4]; };
    union b { struct a r; uint32_t x; } tmp;
    tmp.r = *(struct a *)p;
    return tmp.x;
}

GCC, as desired, compiles this to a single 32-bit load, and it seems to avoid the aliasing issues that could happen if p actually points to a type other than char. In other words, it seems to act as a portable replacement for the GNU C __attribute__((__may_alias__)) attribute. But I'm uncertain whether it's really well-defined...

Was it helpful?

Solution

I believe this will still violate effective typing rules. You want to access a memory location that wasn't declared explicitly (or implicitly via storage in case of dynamic allocation) as containing a struct a through an expression of that type.

None of the sections that have been quoted in other answers can be used to escape this basic restriction.

However, I believe there's a solution to your problem: Use __builtin_memcpy(), which is available even in freestanding environments (see the manual entry on -fno-builtin).


Note that the issue is a bit less clear-cut than I make it sound. C11 section 6.5 §7 tells us that it's fine to access an object through an lvalue expression that has an aggregate or union type that includes one of the aforementioned types among its members.

The C99 rationale makes it clear that this restriction is there so a pointer to an aggregate and a pointer to one of its members may alias.

I believe the ability to use this loophole in the way of the first example (but not the second one, assuming p doesn't happen to point to an actual char [4]) is an unintended consequence, which the standard only fails to disallow because of imprecise wording.

Also note that if the first example were valid, we'd basically be able to sneak in structural typing into an otherwise nominally typed language. Structures in a union with common initial subsequence aside (and even then, member names do matter), an identical memory layout is not enough to make types compatible. I believe the same reasoning applies here.

OTHER TIPS

My reading of aliasing rules (C99, 6.5p7) with the presence of this sentence:

"an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or"

leads to me think it does not violate the C aliasing rules.

But the fact it does not violate aliasing rules is not enough for this code snippet to be valid. It may invoked undefined behavior for other reasons.

(struct s *) &x

is not guaranteed to point to a valid struct s object. Even if we assume the alignment of x is suitable for an object of type struct, the resulting pointer after the cast may not point to a space large enough to hold the structure object (as struct s may have padding after its last member).

EDIT: the answer has been completely reworked from its initial version

Not sure it's a proper answer, but what could happen (in your second example) is this:

  1. The compiler defines struct a as an 8-byte object, with padding after the 4 bytes in the array (why? because it can).
  2. You then use tmp.r = *(struct a *)p; which treats p as an address of a struct a (namely, an 8 byte object). It tries to copy the contents of this object into tmp.r, that is, 8 bytes from the address that p is holding. But you're only allowed to read 4 bytes from there.

Implementations do not have to copy padding bytes, but they're allowed to do so.

In your second example

struct a { char r[4]; };

this structure type might have some alignment restrictions. The compiler might decide that struct a is always 4 byte aligned, e.g, such that it always can use a 4 byte aligned read instruction, without looking at the actual address. The pointer p that you receive as an argument to read32 has no such restriction, so

*(struct a*)p;

might cause a bus error.

I notice that this type of argument is a "practical" one.

In point of view of the standard this is UB as soon as (struct a*)p is a conversion to a type with more restrictive alignment requirements.

From the C standard:

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned(57) for the pointed-to type, the behavior is undefined.

The resulting pointer in this case is guaranteed to be correctly aligned (because the first member of a struct must be coincident with the struct), so this limitation doesn't apply here. What does apply is additional restrictions on pointer use requiring that access to an object is only via pointers compatible with the "effective type" of the object ... in this case, the effective type of x is int and so it cannot be accessed via a struct pointer.

Note that, contrary to some claims, the conversion between pointer types is not limited to round trip use. The standard says that the pointer can be converted, with a proviso as to when such conversions result in undefined behavior. Elsewhere it gives the semantics of the use of pointers of the resulting type. The round-trip guarantees in the standard are additional specifications ... things that you can count on that you could not if not explicitly stated:

Otherwise, when converted back again, the result shall compare equal to the original pointer.

This specifies a guarantee about the round trip, it is not a limitation to a round trip.

However, as noted, the "effective type" language is a limitation on the use of the pointer resulting from a conversion.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top