Pergunta

While trying to debug a problem I'm having using Speex, I noticed that it (well, not just Speex, but some example code as well) does the following:

  • Return a pointer to EncState from an initialization function
  • Cast that pointer to a void pointer
  • Store the void pointer
  • (elsewhere)
  • Cast the void pointer to a pointer to pointer to SpeexMode
  • Dereference the pointer

It so happens that the definition of EncState starts with a field of type SpeexMode *, and so the integer values of a pointer to the first field and a pointer to the struct happen to be the same. The dereference happens to work at runtime.

But... does the language actually allow this? Is the compiler free to do whatever it wants if it compiles this? Is casting a struct T* to a struct C* undefined behavior, if T''s first field is a C`?

Foi útil?

Solução

From the C11 standard:

(C11 §6.7.2.1.15: "A pointer to a structure object, suitably converted, points to its initial member ... and vice versa. There may be unnamed padding within as structure object, but not at its beginning.")

Which means that the behavior you see is allowed and guaranteed.

Outras dicas

Every version of the Standard has treated support for many aliasing constructs as a Quality of Implementation issue, since it would have been essentially impossible to write rules which supported all useful constructs, didn't block any useful optimizations, and could be supported by all compilers without significant rework. Consider the following function:

struct foo {int length; int *dat; };

int test1(struct foo *p)
{
  int *ip = &p->length;
  *ip = 2;
  return p->length;      
}

I think it's rather clear that any quality compiler should be expected to handle the possibility that an object of type struct foo might be affected by the assignment to *ip. On the other hand, consider the function:

void test2(struct foo *p)
{
    int i;
    for (i=0; i < p->length; i++)
        p->dat[i] = 0;
}

Should a compiler be required to make allowances for the possibility that writing to p->dat[i] might affect the value of p->length, e.g. by reloading the value of p->length after at least the first iteration of the loop?

I think some members of the Committee may have intended to require that compilers make such allowance, but I don't think they all did, and the rules as written wouldn't require it since they list the types of lvalue that may be used to access an object of type struct foo, and int is not among them. Some people may think the omission was accidental, but I think it was based on an expectation that compilers would interpret the rule as requiring that objects which are accessed as some particular type in some context be accessed by lvalues which have a visible association with an object of one of the listed types, within that context. The question of what constitutes a "visible association" left as a QoI issue outside the Standard's jurisdiction, but compiler writers were expected to make reasonable efforts to recognize associations when practical.

Within a function like test1, an lvalue of type p is used to derive ip, and p is not used in any other fashion to access p->length between the formation of ip and its last usage. Thus, compilers should have no difficulty recognizing that a store to *ip cannot be reordered across the later read to p->length, even without a general rule giving blanket permission to use pointers of type int* to access int members of unrelated structures. Within test2, however, there is no visible means by which the address of p->length could have been used in the computation of pointer p->dat, and thus it would be reasonable for optimizing compilers intended for most common purposes to hoist the read of p->length before the loop in the expectation that its value won't change.

Rather than making any effort to recognize the types of object from which a pointer is derived, clang and gcc instead opt to behave as though the Standard gives general permission to access struct (but not union!) members using pointers of their types. This is allowable but not required by the Standard (a conforming but garbage quality implementation could process test1 in arbitrary meaningless fashion), but the blindness to pointer derivation needlessly restricts the range of constructs available to programmers, and makes it necessary to forego what should be useful optimizations such as those exemplified by test2().

Overall, the correct answer to almost any question related to aliasing in C is "that's a quality-of-implementation issue". Observations about what clang and gcc do may be useful for people who need to appease the -fstrict-aliasing mode of those compilers, but have little to do with what the Standard actually says.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top