Question

Suppose that I have a dynamic library (.so) on Linux. I also have an existing application that uses the library. The library is stripped. I would like to produce some (approximate) header file for the library, so that I can write another program that uses it.

It is easy enough to use objdump to see what functions are in the library, and ltrace to see every call as it is being made.

How do I figure out what the function arguments are?

Some ideas: I can probably use LD_PRELOAD or dlsym type trick to load a shim library which looks at the stack whenever any function is called in the original library. I can also probably do something in the shim that dumps the registers (this is on ARM, so it would be r0-r3 I suppose). With a bunch more work (by looking at the disassembly), it may be possible to also figure out whether a register contains a pointer that would be dereferenced, and then have the shim function dump what is at that pointer.

It seems like a big step from there to "this function takes as its first argument a pointer to struct with the following fields..." Are there any automated tools for this kind of thing?

Note: I am not at all interested in how the functions work, just how to feed them the right data.

Was it helpful?

Solution

A good start is a disassembler like Objdump, HopperApp or IDA Pro for example. The latter of them detects the parameters automatically for non-special cases.

If you want to understand for your own how this works I would look up the different "calling conventions" (wikipedia is a good start).

Example for __stdcall: say you have x86 .so library and something like this happens in the binary:

push 3
push 2
push 1
call func ; void func(int a, int b, int c) where a=1, b=2 and c=3

The arguments are pushed to the stack in reverse order. EAX, ECX and EDX can be used inside the function (saved by the caller) other registers have to be explicitely saved from the function itself (saved by the callee). This does not tell you anything about the datatype behind the argument. This needs often some more reversing to be resolved.

Even IDA Pro does not detect every information automatically because it depends on a lot of factors and can be very hard :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top