What happens if I cast a function pointer, changing the number of parameters

https://stackoverflow.com/questions/2118889

22-09-2019
|

Question

I'm just beginning to wrap my head around function pointers in C. To understand how casting of function pointers works, I wrote the following program. It basically creates a function pointer to a function that takes one parameter, casts it to a function pointer with three parameters, and calls the function, supplying three parameters. I was curious what would happen:

#include <stdio.h>

int square(int val){
  return val*val;
}

void printit(void* ptr){
  int (*fptr)(int,int,int) = (int (*)(int,int,int)) (ptr);
  printf("Call function with parameters 2,4,8.\n");
  printf("Result: %d\n", fptr(2,4,8));
}


int main(void)
{
    printit(square);
    return 0;
}

This compiles and runs without errors or warnings (gcc -Wall on Linux / x86). The output on my system is:

Call function with parameters 2,4,8.
Result: 4

So apparently the superfluous arguments are simply silently discarded.

Now I'd like to understand what is really happening here.

As to legality: If I understand the answer to Casting a function pointer to another type correctly, this is simply undefined behaviour. So the fact that this runs and produces a reasonable result is just pure luck, correct? (or niceness on the part of the compiler writers)
Why will gcc not warn me of this, even with Wall? Is this something the compiler just cannot detect? Why?

I'm coming from Java, where typechecking is a lot stricter, so this behaviour confused me a bit. Maybe I'm experiencing a cultural shock :-).

Solution

The extra parameters are not discarded. They are properly placed on the stack, as if the call is made to a function that expects three parameters. However, since your function cares about one parameter only, it looks only at the top of the stack and does not touch the other parameters.

The fact that this call worked is pure luck, based on the two facts:

the type of the first parameter is the same for the function and the cast pointer. If you change the function to take a pointer to string and try to print that string, you will get a nice crash, since the code will try to dereference pointer to address memory 2.
the calling convention used by default is for the caller to cleanup the stack. If you change the calling convention, so that the callee cleans up the stack, you will end up with the caller pushing three parameters on the stack and then the callee cleaning up (or rather attempting to) one parameter. This would likely lead to stack corruption.

There is no way the compiler can warn you about potential problems like this for one simple reason - in the general case, it does not know the value of the pointer at compile time, so it can't evaluate what it points to. Imagine that the function pointer points to a method in a class virtual table that is created at runtime? So, it you tell the compiler it is a pointer to a function with three parameters, the compiler will believe you.

OTHER TIPS

If you take a car and cast it as a hammer the compiler will take you at your word that the car is a hammer but this does not turn the car into a hammer. The compiler may be successful in using the car to drive a nail but that is implementation dependent good fortune. It is still an unwise thing to do.

Yes, it's undefined behaviour - anything could happen, including it appearing to "work".
The cast prevents the compiler from issuing a warning. Also, compilers are under no requirement to diagnose possible causes undefined behaviour. The reason for this is that either its impossible to do so, or that doing so would be too difficult and/or cause to much overhead.

The worst offence of your cast is to cast a data pointer to a function pointer. It's worse than the signature change because there is no guarantee that the sizes of function pointers and data pointer are equal. And contrary to a lot of theoretical undefined behaviours, this one can be encountered in the wild, even on advanced machines (not only on embedded systems).

You may encounter different size pointers easily on embedded platforms. There are even processors where data pointers and function pointer do address different things (RAM for one, ROM for the other), the so-called Harvard architecture. On x86 in real mode you can have 16 bits and 32 bits mixed. Watcom-C had a special mode for DOS extender where data pointers were 48 bits wide. Especially with C one should know that not everything is POSIX, as C might be the only language available on exotic hardware.

Some compilers allow for mixed memory models where the code is guaranted to be within 32 bits size and data is addressable with 64bit pointers, or the converse.

Edit: Conclusion, never cast a data pointer to a function pointer.

The behavior is defined by the calling convention. If you use a calling convention where the caller pushes and pops the stack, then it would work fine in this case since it would just mean there are an extra few bytes on the stack during the call. I don't have gcc handy at the moment, but with the microsoft compiler, this code:

int ( __cdecl * fptr)(int,int,int) = (int (__cdecl * ) (int,int,int)) (ptr);

The following assembly is generated for the call:

push        8
push        4
push        2
call        dword ptr [ebp-4]
add         esp,0Ch

Note the 12 bytes (0Ch) added to the stack after the call. After this, the stack is fine (assuming the callee is __cdecl in this case so it does not try to also clean up the stack). But with the following code:

int ( __stdcall * fptr)(int,int,int) = (int (__stdcall * ) (int,int,int)) (ptr);

The add esp,0Ch is not generated in the assembly. If the callee is __cdecl in this case, the stack would be corrupted.

Admitedly I don't know for sure, but you definitely don't want to take advantage of the behavior if it's luck or if it's compiler specific.
It doesn't merit a warning because the cast is explicit. By casting, you're informing the compiler that you know better. In particular, you're casting a void*, and as such you're saying "take the address represented by this pointer, and make it the same as this other pointer" -- the cast simply informs the compiler that you're sure what's at the target address is, in fact, the same. Though here, we know that's incorrect.

I should refresh my memory of the binary layout of the C calling convention at some point, but I'm pretty sure this is what is happening:

1: It is not pure luck. The C calling convention is well-defined, and extra data on the stack is not a factor for the call site, although it may be overwritten by the callee since the callee doesn't know about it.
2: A "hard" cast, using parenthesis, is telling the compiler that you know what you're doing. Since all of the needed data is in one compilation unit, the compiler could be smart enough to figure out that this is clearly illegal, but C's designer(s) didn't focus on catching corner case verifiable incorrectness. Put simply, the compiler trusts that you know what you're doing (perhaps unwisely in the case of many C/C++ programmers!)

To answer your questions:

Pure luck - you could easily trample the stack and overwrite the return pointer to the next executing code. Since you specified the function pointer with 3 parameters, and invoked the function pointer, the remaining two parameters were 'discarded' and hence, the behavior is undefined. Imagine if that 2nd or 3rd parameter contained a binary instruction, and popped off the call procedure stack....
There is no warning as you were using a void * pointer and casting it. That is quite a legitimate code in the eyes of the compiler, even if you have explicitly specified -Wall switch. The compiler assumes you know what you are doing! That is the secret.

Hope this helps, Best regards, Tom.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow