문제

I am using the following script which I pieced together from other ideas found on the web to suit my purposes of monitoring a log file and catching errors that occur.

tail -f log_file | while read LOGLINE
do
  echo -e "${LOGLINE}"
  if [[ "${LOGLINE}" == *ERROR* ]] ; then
    echo -e "ERROR FOUND : ${LOGLINE}\n"

    # handle the error here

  fi
done

If the program that produces the log file crashes or is terminated and is no longer writing to the log, how would I detect it? as it is, tail will just wait and wait until it gets another line.

도움이 되었습니까?

해결책

In general, if you don't necessarily know the process name, or just want to see if any process is accessing a file, you could try fuser. For example, on Linux if I have a server process writing to server.log, then fuser server.log gives something like:

/var/log/server.log: 28977

where 28977 is the process id of the server process. When the server process exits, fuser returns nothing. Obviously in your case you also have the tail process reading the file, so you'd expect more than one PID, e.g.:

/var/log/server.log: 28977 28990

Note from the man page: "fuser outputs only the PIDs to stdout, everything else is sent to stderr.". So, for example, you could pipe to wc -w and check you get at least 2. Getting 1 would mean only your tail -f was accessing the file.

The snag with integrating this in your loop is that read will block until it can read a line, so any check you perform inside the loop will never be run once the file isn't being written to any longer. You'd need to use read -t and specify a timeout. Something like this:

tail -f log_file | while [ 1 ]
do
  read -t 10 LOGLINE
  if [[ -z "${LOGLINE}" && $? -ne 0 ]] ; then
    # Variable is empty and read timed out
    if [[ `fuser log_file 2> /dev/null | wc -w` < 2 ]] ; then
      # Nothing else is using the log file     
      pkill -f "tail -f log_file"  # Specific, so we don't kill tails of other files
      break
    fi
  else
    echo -e "${LOGLINE}"

    # Do your stuff...
  fi
done

(thanks anubhava for reminding me of pkill! Also, note you could do a pgrep instead of the fuser check if you know enough identifying info about the process which is writing to the log.)

Another thing which would be much simpler if you're using Linux is to use the --pid option for tail which will stop it if the process with that PID has stopped. Then you could just do:

LOG_WRITER=`pgrep something-identifying-the-process-writing-the-log`
tail -f log_file --pid=$LOG_WRITER | while read LOGLINE
…the rest of your script

However, it seems there may be cases in which this doesn't notice the process has stopped. It might be sufficient for you though.

다른 팁

You can call pgrep to check whether original process is running or not>

while read -r LOGLINE
do
  echo -e "${LOGLINE}"
  if [[ "${LOGLINE}" == *ERROR* ]] ; then
    echo -e "ERROR FOUND : ${LOGLINE}\n"

    # handle the error here
    if pgrep -q process; then
      echo "process has exited/crashed"
      # kill tail process and break out of loop
      pkill -f tail
      break
    fi

  fi
done < <(tail -f log_file)

To obtain the process IDs of all processes which have your log file open, use lsof:

lsof -Fp /path/to/your/logfle

Note that this will only show processes which actually have the file open. You may miss programs that keep the file closed except the brief instants when they actually need to write to it.

lsof is script-friendly and has many options. See man lsof.

Here is a script which will write a message to the screen once there are no processes with your log file open:

while lsof -Fp /tmp/mylogfile  >/dev/null
do
    sleep 1
done
echo "No processes have the log file open"

There is a catch to the above: your tail -f process will have the file open. You may want instead to show the message when the number of processes with that file open drops below two:

while [ "$(lsof -Fp /tmp/mylogfile | wc -l)" -ge 2 ]
do
    sleep 1
done
echo "There are less than two processes with the log file open"

[I originally answered this question on the Unix site. Since that question is in the process of being closed, I am preserving the answer here.]

There is a general problem with your approach:

The code inside the while ... construct only runs when the logfile changes, i.e. when the process in question writes something to the logfile.

Now, if that process crashes (an event that you want to detect) without writing to the logfile, then you will never enter the body of the while-loop again.

Therefore, it is not useful to check for the process being alive inside the while-loop -- unless the process leaves a useful message in the log when it crashes, then you can of course check for such a message.

You will have to run a second monitoring process that will detect when your process dies. One simple solution for that problem is mon, which you can find here: https://github.com/visionmedia/mon

I advise against indirect checks ("is the logfile still opened for writing?"); just check for the process directly, and use a specialized tool to do that.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top