First let me say that this is horrible way of coding.
That said, let's see how it works:
extern system, puts;
This says that system
and puts
are defined somewhere else, and the linker would take care of giving you their address. The compiler doesn't know what type they have, so, gcc at least assumes it's an int
and issues a warning. I can't find the clause in the standard that says whether this is well-defined, implementation-defined, or undefined behavior, but for sure it's bad. It's especially bad because those symbols are not int
s. I did find this though (C11, Annex J.2 - Undefined behavior):
Two declarations of the same object or function specify types that are not compatible
(6.2.7).
There is also the possibility that int
is too small to hold a pointer. There is no end to how much this is bad.
So, let's see in memory how this looks:
system (as 4-byte int) system (as function)
BYTE1 INSTRUCTION1
BYTE2 INSTRUCTION2
BYTE3 INSTRUCTION3
BYTE4 INSTRUCTION4
INSTRUCTION5
INSTRUCTION6
INSTRUCTION7
INSTRUCTION8
...
So what the linker does is, to make the address of system
the same on each compiled object (effectively linking them together). The contents of the system
on the other hand depends on its type which the linker doesn't know or care about.
So in your file, if right after:
extern system, puts;
you write:
printf("%d\n", system);
you will get the first couple instructions of the system
function as if it were an int
and print it. Which is bad, and don't do it.
On the other hand, as I said, the address of system
as defined as extern
in your program is the same as the real system
function. So if you take the address:
&system
and cast it to its correct type:
(void(*)(char*))&system
you get a function pointer, pointing to the real system
function. Note that system
was just now an int
, and we are about to call it as a function, this is by itself undefined behavior:
C11 Annex J-2 (undefined behavior):
An object has its stored value accessed other than by an lvalue of an allowable type
(6.5).
C11 6.5 (emphasis mine):
6. The effective type of an object for an access to its stored value is the declared type of the object, if any. ...
7. An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 88)
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the
object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
Getting passed all of this, you now have a function pointer containing the real address of system
. You then call it to your heart's content. The same is done with puts
.
That was bad. This is how this is actually supposed to be done:
#include <stdlib.h>
#include <stdio.h>
int main(int argv,char **argc)
{
char buf[256];
/* system was unused */
strcpy(buf,argc[1]);
puts(argc[2]);
return 0; /* 0 means no error */
}
Besides the goodness of using the implementation-provided header files, let's see how this one instead actually works.
Inside the header files, you have many lines with many declarations. The one for puts
could look like this:
int puts(const char *s);
First, note how the function pointer in your code actually got the function signature wrong? That is bad. It's bad because calling a void
function, the caller doesn't need to provide a return value space for it, while puts
expects one to be there, so puts
will write the return value where it expects to find it, but in reality that space belongs to something else. This is called stack corruption.
Anyway, when the compiler sees a line like this:
int puts(const char *s);
it will know that puts
is an external definition:
C11 6.2.2 (emphasis mine):
5. If the declaration of an identifier for a function has no storage-class specifier, its linkage is determined exactly as if it were declared with the storage-class specifier extern. If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.
it will know it's a function, and what its signature is.
So just by calling fputs(...)
the compiler does everything correctly.