How is it legal to reference an undefined type inside a structure?

https://stackoverflow.com/questions/2895209

04-10-2019
|

Question

As part of answering another question, I came across a piece of code like this, which gcc compiles without complaint.

typedef struct {
    struct xyz *z;
} xyz;
int main (void) {
    return 0;
}

This is the means I've always used to construct types that point to themselves (e.g., linked lists) but I've always thought you had to name the struct so you could use self-reference. In other words, you couldn't use xyz *z within the structure because the typedef is not yet complete at that point.

But this particular sample does not name the structure and it still compiles. I thought originally there was some black magic going on in the compiler that automatically translated the above code because the structure and typedef names were the same.

But this little beauty works as well:

typedef struct {
    struct NOTHING_LIKE_xyz *z;
} xyz;

What am I missing here? This seems a clear violation since there is no struct NOTHING_LIKE_xyz type defined anywhere.

When I change it from a pointer to an actual type, I get the expected error:

typedef struct {
    struct NOTHING_LIKE_xyz z;
} xyz;

qqq.c:2: error: field `z' has incomplete type

Also, when I remove the struct, I get an error (parse error before "NOTHING ...).

Is this allowed in ISO C?

Update: A struct NOSUCHTYPE *variable; also compiles so it's not just inside structures where it seems to be valid. I can't find anything in the c99 standard that allows this leniency for structure pointers.

Solution

The parts of the C99 standard you are after are 6.7.2.3, paragraph 7:

If a type specifier of the form struct-or-union identifier occurs other than as part of one of the above forms, and no other declaration of the identifier as a tag is visible, then it declares an incomplete structure or union type, and declares the identifier as the tag of that type.

...and 6.2.5 paragraph 22:

A structure or union type of unknown content (as described in 6.7.2.3) is an incomplete type. It is completed, for all declarations of that type, by declaring the same structure or union tag with its defining content later in the same scope.

OTHER TIPS

As the warning says in the second case, struct NOTHING_LIKE_xyz is an incomplete type, like void or arrays of unknown size. An incomplete type can only appear as a type pointed to, with an exception for arrays of unknown size that are allowed as the last member of a struct, making the struct itself an incomplete type in this case. The code that follows cannot dereference any pointer to an incomplete type (for good reason).

Incomplete types can offer some datatype encapsulation of sorts in C... The corresponding paragraph in http://www.ibm.com/developerworks/library/pa-ctypes1/ seems like a good explanation.

The 1st and 2nd cases are well-defined, because the size and alignment of a pointer is known. The C compiler only needs the size and alignment info to define a struct.

The 3rd case is invalid because the size of that actual struct is unknown.

But beware that for the 1st case to be logical, you need to give a name to the struct:

//             vvv
typedef struct xyz {
    struct xyz *z;
} xyz;

otherwise the outer struct and the *z will be considered two different structs.

The 2nd case has a popular use case known as "opaque pointer" (pimpl). For example, you could define a wrapper struct as

 typedef struct {
    struct X_impl* impl;
 } X;
 // usually just: typedef struct X_impl* X;
 int baz(X x);

in the header, and then in one of the .c,

 #include "header.h"
 struct X_impl {
    int foo;
    int bar[123];
    ...
 };
 int baz(X x) {
    return x.impl->foo;
 }

the advantage is out of that .c, you cannot mess with the internals of the object. It is a kind of encapsulation.

You do have to name it. In this:

typedef struct {
    struct xyz *z;
} xyz;

will not be able to point to itself as z refers to some complete other type, not to the unnamed struct you just defined. Try this:

int main()
{
    xyz me1;
    xyz me2;
    me1.z = &me2;   // this will not compile
}

You'll get an error about incompatible types.

Well... All I can say is that your previous assumption was incorrect. Every time you use a struct X construct (by itself, or as a part of larger declaration), it is interpreted as a declaration of a struct type with a struct tag X. It could be a re-declaration of a previously declared struct type. Or, it can be a very first declaration of a new struct type. The new tag is declared in scope in which it appears. In your specific example it happens to be a file scope (since C language has no "class scope", as it would be in C++).

The more interesting example of this behavior is when the declaration appears in function prototype:

void foo(struct X *p); // assuming `struct X` has not been declared before

In this case the new struct X declaration has function-prototype scope, which ends at the end of the prototype. If you declare a file-scope struct X later

struct X;

and try to pass a pointer of struct X type to the above function, the compiler will give you a diagnostics about non-matching pointer type

struct X *p = 0;
foo(p); // different pointer types for argument and parameter

This also immediately means that in the following declarations

void foo(struct X *p);
void bar(struct X *p);
void baz(struct X *p);

each struct X declaration is a declaration of a different type, each local to its own function prototype scope.

But if you pre-declare struct X as in

struct X;
void foo(struct X *p);
void bar(struct X *p);
void baz(struct X *p);

all struct X references in all function prototype will refer to the same previosly declared struct X type.

I was wondering about this too. Turns out that the struct NOTHING_LIKE_xyz * z is forward declaring struct NOTHING_LIKE_xyz. As a convoluted example,

typedef struct {
    struct foo * bar;
    int j;
} foo;

struct foo {
    int i;
};

void foobar(foo * f)
{
    f->bar->i;
    f->bar->j;
}

Here f->bar refers to the type struct foo, not typedef struct { ... } foo. The first line will compile fine, but the second will give an error. Not much use for a linked list implementation then.

When a variable or field of a structure type is declared, the compiler has to allocate enough bytes to hold that structure. Since the structure may require one byte, or it may require thousands, there's no way for the compiler to know how much space it needs to allocate. Some languages use multi-pass compilers which would be able find out the size of the structure on one pass and allocate the space for it on a later pass; since C was designed to allow for single-pass compilation, however, that isn't possible. Thus, C forbids the declaration of variables or fields of incomplete structure types.

On the other hand, when a variable or field of a pointer-to-structure type is declared, the compiler has to allocate enough bytes to hold a pointer to the structure. Regardless of whether the structure takes one byte or a million, the pointer will always require the same amount of space. Effectively, the compiler can tread the pointer to the incomplete type as a void* until it gets more information about its type, and then treat it as a pointer to the appropriate type once it finds out more about it. The incomplete-type pointer isn't quite analogous to void*, in that one can do things with void* that one can't do with incomplete types (e.g. if p1 is a pointer to struct s1, and p2 is a pointer to struct s2, one cannot assign p1 to p2) but one can't do anything with a pointer to an incomplete type that one could not do to void*. Basically, from the compiler's perspective, a pointer to an incomplete type is a pointer-sized blob of bytes. It can be copied to or from other similar pointer-sized blobs of bytes, but that's it. the compiler can generate code to do that without having to know what anything else is going to do with the pointer-sized blobs of bytes.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow