Question

I am implementing some limited remote debugging functionality for an application written in C running on a Linux box. The goal is to communicate with the application and lookup the value of an arbitrary variable or run an arbitrary function.

I am able to lookup symbols through dlsym() calls, but I am unable to determine if the address returned refers to a function or a variable. Is there a way to determine typing information via this symbol table?

Was it helpful?

Solution 2

You can read the file /proc/self/maps and parse the first three fields of each line:

<begin-addr>-<end-addr> rwxp ...

Then you search the line that contains the address you are looking for and check the permissions:

  • r-x: it is code;
  • rw-: it is writable data;
  • r--: it is read-only data;
  • any other combination: something weird (rwxp: generated code, ...).

For example the following program:

#include <stdio.h>

void foo() {}
int x;

int main()
{
    int y;
    printf("%p\n%p\n%p\n", foo, &x, &y);
    scanf("%*s");
    return 0;
}

...in my system gives this output:

0x400570
0x6009e4
0x7fff4c9b4e2c

...and these are the relevant lines from /proc/<pid>/maps:

00400000-00401000 r-xp 00000000 00:1d 641656       /tmp/a.out
00600000-00601000 rw-p 00000000 00:1d 641656       /tmp/a.out
....
7fff4c996000-7fff4c9b7000 rw-p 00000000 00:00 0    [stack]
....

So the addresses are: code, data and data.

OTHER TIPS

On on x86 platforms, you can check for the instructions used to set up the stack for a function if you can look into it's address space. It is typically:

push ebp
mov ebp, esp

I'm not positive about x64 platforms, however I think it is similar:

push rbp
mov rbp, rsp

This describes the C calling convention

Keep in mind however, compiler optimizations may optimize out these instructions. If you want this to work, you may have to add a flag to disable this optimization. I believe for GCC, -fno-omit-frame-pointer will do the trick.

One possible solution is to extract a symbol table for the application by parsing the output of the nm utility. nm includes information on symbol type. Symbols with the T (global text) type are functions.

The trouble with this solution is that you have to ensure that your symbol table matches the target (especially if you are going to use it to extract the addresses, although using it in combination with dlsym() would be safer). The method I have used to ensure that is to make the symbol table generation part of the build process as a post-processing step.

I guess this is not a very reliable method, but it might work:

Take the address of a well known function, such as main() and the address of a well known global variable.

Now take the address of the unknown symbol and compute the absolute value of the difference between this address and the other two. The smallest difference will indicate that the unknown address is closer to a function or to a global variable, meaning that probably it is another function or another global variable.

This method works under the asumption that the compiler/linker will pack all global variables to a specific memory block, and all functions to another memory block. Microsoft compiler, for example, put all global variables before (lower addresses in virtual memory) functions.

I'm assuming you won't be willing to check for local variables, as whose address cannot be returned by a function (once the function ends, the local variable is lost)

It can be done by combining dlsym() and dladdr1().

#define _GNU_SOURCE

#include <dlfcn.h>
#include <link.h>
#include <stdio.h>

int symbolType(void *sym) {
    ElfW(Sym) *pElfSym;
    Dl_info i;

    if (dladdr1(sym, &i, (void **)&pElfSym, RTLD_DL_SYMENT))
        return ELF32_ST_TYPE(pElfSym->st_info);

    return 0;
}

int main(int argc, char *argv[]) {
    for (int i=1; i < argc; ++i) {
        printf("Symbol [%s]: ", argv[i]);

        void *mySym = dlsym(RTLD_DEFAULT, argv[i]);

        // This will not work with symbols that have a 0 value, but that's not going to be very common
        if (!mySym)
            puts("not found!");
        else {
            int type = symbolType(mySym);
            switch (type) {
                case STT_FUNC: puts("Function"); break;
                case STT_OBJECT: puts("Data"); break;
                case STT_COMMON: puts("Common data"); break;
                /* get all the other types from the elf.h header file */
                default: printf("Dunno! [%d]\n", type);
            }
        }
    }

    return 0;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top