Question

I am currently trying to figure out why a shell script fails at concurrent logging every once in a while.

I have a shell function like the following:

log()
{
   local l_text=$1
   local l_file="/path/to/logs/$(date +%Y%m%d)_script.log"

   local l_line="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) ${l_text}"

   echo ${l_line} >> ${l_file}
}

Now every once in a while this fails with a syntax error:

/path/to/script.sh: command substitution: line 163: syntax error near unexpected token `)'
/path/to/script.sh: command substitution: line 163: `hostname -s) ${l_text}'

The problem is, that I have multiple sub-processes, which each want to log as well as send traps (during which logging is performed as well). I haved debugged the problem and found out, that this happens, when the function is entered three times simultaneously. First the main process enters, then the child. After the date part of l_text is executed, main get's interrupted by a trap which is caused by child and in this trap tries to log something. The child and the trap finish their logging nicely, but then main is resumed after the trap and tries to execute the hostname part (presumedly) and fails with this error.

So it seems like main does not like being put to sleep while it is producing the $(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) ${l_text} part of the log statement and cannot resume nicely. I was assuming this should work fine, because I am just using local variables and thread safe output methods.

Is this a general concurrency problem I am running into here? Or is this very specific for the trap mechanism in bash scripts? I know about the commodities of SIGNAL handling in C, so I am aware that only certain operations are allowed in SIGNAL handlers. However I am not aware if the same precautions also apply when handling SIGNALs in a bash script. I tried to find documentation on this, but none of the documents I could find gave any indications of problems with SIGNAL handling in scripts.

EDIT:

Here is an actuall simple script that can be used to replicate the problem:

#!/bin/bash

log() {
  local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1"
  echo $text >> /dev/null
}

sub_process() {
  while true; do
    log "Thread is running"
    kill -ALRM $$
    sleep 1
  done
}

trap "log 'received ALRM'" ALRM 

sub_process &
sub_process_pid=$!
trap "kill ${sub_process_pid}; exit 0" INT TERM


while true; do
  log "Main is running"
  sleep 1
done

Every once in a while this script will get killed because of a syntax error in line 5. Line 5 is echo $text >> /dev/null, but since the syntax error also mentiones the hostname command, similar to the one I posted above, I am assuming there is an of-by-one error as well and the actual error is in line 4, which is local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1".

Does anybody know what to do with the above script to correct it? I alread tried moving out the construction of the string into some temporary variables:

log() {
  local thedate=$(date +'%Y-%m-%d %H:%M:%S')
  local thehostname=$(hostname -s)
  local text="${thedate} ${thehostname} $1"
  echo $text >> /dev/null
}

This way the error appears less frequently, but it still is present, so this is not a real fix.

Was it helpful?

Solution

I would say that this is definitely a bug in bash and I would encourage you to report it to the bash developers. At the very least, you should never get a syntax error for what is syntactically correct code.

For the record, I get the same results as you with GNU bash, version 4.2.10(1)-release (x86_64-pc-linux-gnu).

I found that you can workaround the problem by not calling a function in your trap handler. E.g. replacing

trap "log 'received ALRM'" ALRM 

with

trap "echo $(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) received ALRM" ALRM

makes the script stable for me.

I know about the commodities of SIGNAL handling in C, so I am aware that only certain operations are allowed in SIGNAL handlers. However I am not aware if the same precautions also apply when handling SIGNALs in a bash script.

I guess you shouldn't have to take special precautions but apparently in practice you do. Given that the problem seem to go away without the function call, I'm guessing that something in bash either isn't re-entrant where it should be or fails prevent re-entry in the first place.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top