Question

I've encountered a funny problem on unix systems (tested on SunOS and AIX): I execute a script and want to list itself (using ps) - sometimes ps displays two, additional child processes of the script, sometimes its only one extra child process and most of the time the output correctly shows a single process. I've found a thread here Multiple processes with the same name but my case is different.

Consider a script named test.sh as the one below:

#!/bin/ksh
echo before $$
ps -f | grep test.sh | grep -v grep
echo after $$

It's very simple - it shows its PID, then finds itself (and such like) on the process list and only filters out the grep command (just in case). Now, I'm going to execute a very simple routine in shell:

while [ 1 -eq 1 ]; do test.sh; done

Just an infinite loop executing test.sh one by one. What I get in the output? See below:

before 20990
  user   20990 14993   0 08:54:06 pts/5       0:00 /bin/ksh test.sh
after 20990
before 20994
  user   20994 14993   0 08:54:06 pts/5       0:00 /bin/ksh test.sh
after 20994
before 20998
  user   21001 20998   0 08:54:06 pts/5       0:00 /bin/ksh test.sh
  user   21000 20998   0 08:54:06 pts/5       0:00 /bin/ksh test.sh
  user   20998 14993   0 08:54:06 pts/5       0:00 /bin/ksh test.sh
after 20998
before 21002
  user   21002 14993   0 08:54:06 pts/5       0:00 /bin/ksh test.sh
after 21002
before 21006
  user   21006 14993   0 08:54:07 pts/5       0:00 /bin/ksh test.sh
after 21006

Can anyone explain to me what are the processes 21001 and 21000? they are not forked, since there are no traces "before/after" for them. This happens only occasionally...

This is not much of a problem for me but I'm curious to know what happens here and what to expect in more complex cases.

Let's say I want to allow my script's execution only if there are no other sessions of this script executed. I will then use ps and filter the "test.sh" + filter out all the lines with my PID - here, the script will filter out itself + its children, which is good, but that's just workaround to an issue that I don't really understand. Hence this thread :)

I didn't play with fetching the actual data stored in /proc, since I don't know the filesystem's structure on Sun or AIX.

Was it helpful?

Solution

These extra processes are the subshells about to process the first two pipeline components.

This is confirmed with running a dtrace script showing all exec calls:

before 3929
    root  3929  1630   0 10:36:03 pts/3       0:00 /bin/ksh ./test.sh
    root  3932  3929   0 10:36:03 pts/3       0:00 /bin/ksh ./test.sh
    root  3931  3929   0 10:36:03 pts/3       0:00 /bin/ksh ./test.sh
after 3929

dtrace output for these processes:

2013 Sep  5 10:36:02 3929 /bin/ksh ./test.sh
2013 Sep  5 10:36:02 3931 grep test.sh
2013 Sep  5 10:36:02 3932 grep -v grep

The fact /bin/ksh ./test.sh is displayed instead of the actual command run is argv[0] has not been updated yet. It will be replaced only after the exec call has completed.

Just after a fork, both the parent and child process share the same argument list. The only difference is the process ID. This is what you are observing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top