Question

The main question:

How can a puts statement (of a hardcoded string nonetheless) affect program flow?


I'm going to dive right into incomplete pieces of the code, for reasons to be explained later.

proc bgerror { error } {
    puts "Background error: $error"
    exit 1
}

Running my Tcl program, I get:

Background error: can't read "YPE(PROCESS)": no such variable

Fair enough, I thought; must've mucked up a $ or bracket somewhere. To find the issue, I put puts statements everywhere and re-ran the program.

This time, however, my program crashed:

9526:   tclsh testProgram
 fffffd7ffeafebba waitid   (0, 253d, fffffd7fffdf4950, 3)
 fffffd7ffeaeff9d waitpid () + 7d
 fffffd7ffe635132 __1cPGetPstackOutput6Fiiipci_v_ () + b2
 fffffd7ffe634bfb App_CoreSignalHandler () + 76b
 fffffd7ffeafb7b6 __sighndlr () + 6
 fffffd7ffeaf0b82 call_user_handler () + 252
 fffffd7ffeaf0d68 sigacthandler (b, fffffd7fffdfa500, fffffd7fffdfa1a0) + a8
 --- called from signal handler with signal 11 (SIGSEGV) ---

Ouch.

Eventually, I noticed something extremely odd:

proc OnNewState { } {
    puts "foobar"
    # ...
}

With the puts statement, I get the crash. Without it, I get the original error. (Huh?!) I flipped back and forth many times to make sure it was deterministic—and it was.

Now, the reason I dove right into incomplete pieces of the code is that I wanted to focus the attention on the abstract, not the particular.

(The complete code is complex and opaque, mostly utilizing my company's infrastructure libraries, so it wasn't practical to simplify it to be understandable anyway. Furthermore, I already know that the issue stems from one of the infrastructure libraries, because the issue disappears when I remove a bit of code associated with a TCP publisher/subscriber stack library.)

How can a puts statement (of a hardcoded string nonetheless) affect program flow?

Even if I were to begin digging into the C source underlying the library in question, I wouldn't know what to look for.

Hoping for some experienced Tcl'ers to shed some light...

Was it helpful?

Solution

First of all, the evidence you provide tells us that you're in a context with a SIGSEGV handler and that it was set by custom code (Tcl doesn't set a SIGSEGV handler).

Tcl's puts command will only affect control flow by generating an error, and that it does if either you pass the wrong number of arguments, invalid arguments, or if it encounters problems in the I/O layer (e.g., if you closed the stdout channel). A simple puts "foobar" is definitely a valid call, so the problem is coming out of the channel layer. Or you've got a non-standard puts; if your custom code replaced the standard version, there's pretty much anything that could happen.

So what might be happening? Well, at this stage my initial suspicion is that you've got memory corruption elsewhere in your program, and that this somehow has hit the internals of the stdout channel. If I'm right, you'll find it very difficult to track this one down; non-local crashes are always difficult to hunt. I suggest that you use some combination of turning off that signal handler, attaching gdb so you can see where the crash really happened, and using valgrind to ensure that memory is handled correctly (if you're lucky, valgrind will point to where the problem is pretty much straight off).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top