How does machine code access parameters to a subroutine call?
-
28-10-2019 - |
Question
When running a program you can pass paramters, e.g.
$ myProgram par1 par2 par3
In C you can access these paramters by looking at argv
,
int main (int argc, char *argv[])
{
char* aParameter = argv[1]; // Not sure if this is 100% right but you get the idea...
}
How would this translate in assembly / x86 machine code? How would you access the variables given to you? How would the system give you these variables?
Im very new to assembly, it seams you can only access registers and absolute addresses. I am puzzled how you could access parameters. Does the system preload the parameters into a special register for you?
Solution
Function calls
Parameters are usually passed on the stack, which is a part of memory that is pointed to by esp
. The operating system is responsible for reserving some memory for the stack and then setting up esp
properly before passing control to your program.
A normal function call could look something like this:
main:
push 456
push 123
call MyFunction
add esp, 8
ret
MyFunction:
; [esp+0] will hold the return address
; [esp+4] will hold the first parameter (123)
; [esp+8] will hold the second parameter (456)
;
; To return from here, we usually execute a 'ret' instruction,
; which is actually equivalent to:
;
; add esp, 4
; jmp [esp-4]
ret
There are different responsibilities split between the calling function and the function that is being called, with regards to how they promise to preserve registers. These rules are referred to as calling conventions.
The example above uses the cdecl calling convention, which means that parameters are pushed onto the stack in reverse order, and the calling function is responsible for restoring esp
back to where it pointed before those parameters were pushed to the stack. That's what add esp, 8
does.
Main function
Typically, you write a main
function in assembly and assemble it into an object file. You then pass this object file to a linker to produce an executable.
The linker is responsible for producing startup code that sets up the stack properly before control is passed to your main
function, so that your function can act as if it were called with two arguments (argc/argv). That is, your main
function is not the real entry point, but the startup code jumps there after it has set up the argc/argv arguments.
Startup code
So how does this "startup code" look? The linker will produce it for us, but it's always interesting to know how stuff works.
This is platform specific, but I'll describe a typical case on Linux. This article, while dated, explains the stack layout on Linux when an i386 program starts. The stack will look like this:
esp+00h: argc
esp+04h: argv[0]
esp+08h: argv[1]
esp+1Ch: argv[2]
...
So the startup code can get the argc/argv values from the stack and then call main(...)
with two parameters:
; This is very incomplete startup code, but it illustrates the point
mov eax, [esp] ; eax = argc
lea edx, [esp+0x04] ; edx = argv
; push argv, and argc onto the stack (note the reverse order)
push edx
push eax
call main
;
; When main returns, use its return value (eax)
; to set an exit status
;
...
OTHER TIPS
The C-runtime is doing some work for you here - it fetches the program arguments from the OS and parses them if necessary before involking your main
function. In asemmbler, you'll have to fetch the command arguments and parse them yourself. How you get the program arguments is OS specific.
In the same way your program does; you just have to do it manually.
Arguments to functions are stored in various registers/memory segments before the function is called. When you call a function in assembly you have to setup the stack manually before the call. The calling convention decides where these variables go, how they are ordered, and how they are accessed.
For example, argc
and argv
would be created and pushed onto the stack. The data they point to would have already been created as well. When the function is called it knows that arguments 1..n will have been placed in some section of memory according to the calling convention.
Here is a quick rundown on calling conventions with some examples as to how the stack would be setup before calling a function.
On a side note, some amount of work has to be done before main
is called, and this is hidden from you. This is a good thing; we don't want to write a bunch of bootstrap code every time we begin a new project.