Question

I would like to know whether strace can cause anomaly for the program it is tracing. Currently, I am trying to trace a random segmentation fault error (but it seems like the program never crashes that way when I use strace) which is caused in a line where I call pthread_cond_wait().

When I directly run my program - which is actually a mix of c/c++, it sometimes works as it is supposed to be, but as mentioned before, sometimes it crashes at pthread_cond_wait() (by the way, if anyone wants to help me with that problem, see here, any help would be much appreciated).

If I directly run my program and attach strace to the process like this:

strace -ttTD -o strace_today.txt -p PROCESS_ID

The output is a one-liner where it says that it is waiting for a futex (effecively like this:)

futex(x,FUTEX_WAIT_PRIVATE,x)

If I run my program from strace like this:

strace -ttTD -o strace_today.txt example_program

Then at some point of my file output, to be precise, when I call pthread_cond_wait(), it keeps spamming with multiple lines like these (and every time, the value the futex() call is waiting for is higher than before, here it is 15)

12:46:15.636366 semop(11599962, {{0, -1, 0}}, 1) = 0 <0.000031>
12:46:15.636512 futex(0x8053838, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000033>
12:46:15.636637 futex(0x8053864, FUTEX_WAIT_PRIVATE, 15, NULL) = ? ERESTARTSYS (To be restarted) <0.002034>
12:46:15.638832 futex(0x8053864, FUTEX_WAIT_PRIVATE, 15, NULL) = 0 <0.001449>
12:46:15.640436 clone(child_stack=0xb6cd0484, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xb6cd0bd8, {entry_number:6, base_addr:0xb6cd0b70, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb6cd0bd8) = 25403 <0.000045>
12:46:15.640598 semop(11599962, {{0, -1, 0}}, 1) = 0 <0.000015>

I also tried to run strace as a child and not parent of the process (in the hope it would make a difference). And even though I tried to catch that random segmentation fault error, it never appeared/happened.

Now my question is whether this is common and on purpose or whether my strace-call is bogus. And if not, are there any syscalls I need to be aware of, for they might not work with strace, or does this strange behavior concern a group of syscalls? Is there any way around this?

I am using debian-squeeze, if that might be relevant.

Update 1

I totally forgot to mention that I am running multiple threads (POSIX threads) and a few childs. Though the pthread_cond_wait() should not encounter any race, since it is definitely the first call after a pthread_mutex_lock() which accesses the pthread_cond_t and pthread_mutex_t which I am parsing as arguments. But I do not know whether inside the pthread_cond_wait() might be any race conditions. I will provide programcode if neccessary.

Was it helpful?

Solution

The most likely cause for problems like this is the fact that strace can influence the timing of your application which may expose locking bugs.

OTHER TIPS

Nearly all C++ and C segfaults include undefined behaviour at some point. Because of this, the compiler is free to implement a crash system that doesn't trigger when strace is run.

In all seriousness, could there be timing issues in the program somewhere that do not occur under strace? These timing issues are particularly stingy with multithreading (e.g. deadlocks only occuring in release mode).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top