Understanding how to connect pipes and wait for them in a custom C shell

https://stackoverflow.com/questions/23066929

03-07-2023
|

Question

EDIT Changed title as the problem is no long just how to connect them, but also how to wait for them. Update I solved the problem, and have update my wait handling code below to reflect what is now working. I needed to close all the pipes before waiting for the last sub-command. Previously I was doing that afterwards.

I'm writing a CLI as an assignment in Linux GNU99 C, and implementing pipes at the moment. Initially I thought my problem had to do with the way I had connected the pipes, because I wasn't getting the desired result. Now I've realised that it also has to do with how I wait for the sub-commands that are being chained.

As a template, I'm using the following command: ls|grep "hello"|sort -r. LS outputs to GREP which outputs to SORT which outputs to stdout. (A common command sequence).

In reference to the diagram below:

In the respective child processes,
For LS file descriptors (FD) 3,5,6 are not used
For GREP file descriptors (FD) 4 and 5 are not used
For SORT file descriptors (FD) 3,4 and 6 are not used

For LS dup2(4 , STDOUT_FILENO) (binds its stdout to fd 4)
For GREP dup2(3 , STDIN_FILENO) and dup2(6, STDOUT_FILENO) (binds both stdin/stdout to their respective fds)
For SORT dup2(5 , STDIN_FILENO) (binds stdin to fd 5)

In each child, once I've done the DUP2()'s, I close all the file descriptors (3-6) before passing control to the actual command through execvp().
~~In the parent process, I close all the file descriptors (3-6) after I've launched all the children.~~(Moved this into the launcher, see code below.)

//                                              
//  fd#           ls---\
//                     |
//  3        /-----R   |
//           |     |   |
//  4        |     W --/
//           |
//           |
//           \----grep--\
//                      |  
//  5        /-----R    |
//           |     |    |
//  6        |     W ---/
//           |
//           \----sort
//

EDIT
Thanks 'mah' for the early confidence boost, and 'Jon' for the detailed explanation that came a little later.

I actually thought I got it all working at one point. But, as it turned out, only when all the sub-commands were executed as background processes. That was nice, but not quite what I want, since background processes require & at the end of the command line and the final output is not synchronised with the prompt.

As it currently stands, I seem to have commands with one pipe, eg: ls|sort, working consistently in the foreground, but when I introduce a second pipe, eg: ls|grep|sort, my prompt sometimes gets printed while the compound command is still outputting, which means its running in the background rather than the foreground as its supposed to.

Here is an explanation of my code:

The shell allows the user to type in more than one command, which are delimited by ;. Single and multiple commands which don't use pipes work fine, both as foreground processes and background. I've also implemented a 'source' command which is able to recurs when the script calls another script.

So the only remaining problem I have is with compound commands that use pipes.

As per standard parsing, I've broken up the user's input into tokens delimited by NULL characters. I keep an array of pointers to each token (which represent commands and parameters), and a parallel array which keeps track of commands. Fairly standard approach I think.

My strategy for dealing with compound commands using pipes has been to treat them as a single command as long as possible. This makes it easier to connect the sub-commands with the pipes (as I don't have to pass around extra information through my program) when I need to. So I designed the parser to give the pipe character a separate token of its own. Thus, in my launchControl function, which calls my launch function (where fork() and execvp() are), I do a final preparation of the sub-commands.

The final preparation involves a few steps:
(1) replacement of the pipe tokens with NULL tokens (thus splitting the sequence into sub-commands compatible with execvp(),
(2) determining which tokens are the sub-commands (as opposed to parameters for the sub-commands),
(3) determining which sub-command reads(writes) to which pipe.

Having done these steps, I enter a loop that passes the necessary info for each sub-command to the launch function itself. After I finish this loop, I close all the pipes created. Here is the signature of my launch function:
int launch (char **tokenList, enum ioTypes procType, int pipeIn, int pipeOut, int *allPipes, enum processType pType)
tokenList is the sub-command token (followed by its parameters),
procType (is either none,out,in, or both) and describes its relation to the use of pipes,
pipeIn is the sub-command's input file descriptor (0 if not used),
pipeOut is the sub-command's output file descriptor (0 if not used),
allPipes is a list of all the pipes used in the compound command,
pType indicates whether the command is to run foreground/background.

(I am using a Signal handler to allow background tasks to report when they are done, same as in bash.)

The launch function (for commands that involve pipes) does the following:

Blocks SIGCHLD to delay SIGCHLD until I'm in the last sub-command.
Create the fork(), then using a switch statement:
IN THE CHILD: (case process == 0)
Depending on procType, dup2 is called to connect the sub-commands stdin/stdout to the appropriate file descriptors (see diagram above).
Closes ALL the pipes, as per allPipes (including those used in the dup2 function)
Perform redirection if necessary.
Call execvp() with the sub-command/arguments in tokenList
IN THE PARENT: (default case)
If the current sub-command is the last in the sequence, I unblock SIGCHLD,
And this is where I have my problem.
The code below is WIP, works to some degree but not quite right. It is my current attempt.

    //allPipes = NULL for a command that doesn't use pipes.
    // procType == in, only occurs for the last sub-command in the sequence.
    if ( (allPipes == NULL) || ( (allPipes != NULL) && (procType == in) )) {
        if (allPipes != NULL) {
            for (int i=2; i<allPipes[0]; i++) { // Parent closes all pipes.
                close(allPipes[i]);
            }
        }
        int status; // int where child status will be recorded
        pid_t pid;
        do {
            pid = waitpid(WAIT_ANY, &status,0);
//                      fprintf(stderr,"Got a PID = %d\n",pid);
        } while (pid >0);
        if (pid == -1 && !(errno == ECHILD)) {
            perror(NULL);
            exit(errno);
        }
    }

This version seems to work fine with ls|sort for as many repeated commands as I have the patience to test.
However, when I make the command ls|sort|grep it becomes unreliable. It usually works fine the first two times, but after that, my prompt starts to appear in the the middle of my output, which means that it's running in the background.

@mah:
Here is my code for tracking commands and pipes, and how I connect them:

struct pipefdRecord {
    int pos;        //  Position of the pipe in the token list
    int aPipe[2];   //  pipe file descriptor [0] read / [1] write
}   pipefdRecord;

struct cmdRecord {
    char **command;     // Pointer to the sub-command token
    int ndxCommand;     // Position of command token in the token list
    enum ioTypes mode;  // none (0), output(1), input(2), or both(3)
    int pipeIn;         // pipe fd assigned to this process' input
    int pipeOut;        // pipe fd assigned to this process' output
}   cmdRecord;


struct pipefdRecord *pipesAt = malloc(sizeof(struct pipefdRecord));
struct cmdRecord * cmdList = (struct cmdRecord *)malloc(sizeof(struct cmdRecord));

for (int i=0; i<noCommands; i++) { // writing side of pipes
    for (int j=0; j<noPipes; j++) {
        if ((cmdList[i].ndxCommand < pipesAt[j].pos) && (pipesAt[j].aPipe[1] !=0)) {
            cmdList[i].pipeOut = pipesAt[j].aPipe[1];  // assign writing
            pipesAt[j].aPipe[1]=0;
            cmdList[i].mode = out;
            break;
        }
    }
}
for (int i=noCommands-1; i>=0; i--) { // reading side of pipes
    for (int j=noPipes-1; j>=0; j--) {
        if (cmdList[i].ndxCommand > pipesAt[j].pos && (pipesAt[j].aPipe[0] !=0)) {
            cmdList[i].pipeIn = pipesAt[j].aPipe[0];  // assign reading
            pipesAt[j].aPipe[0]=0;
            cmdList[i].mode = cmdList[i].mode | in;
            break;
        }
    }
}

With the above code, my pipe allocations are always correct for an arbitrary number of pipes.

Cheers, Nap

Solution

I think you're missing some closes, but you are lucky that the missing closes shouldn't prevent your code from working.

It appears from your description that you create two pipes, and that the descriptors returned are 3, 4, 5, 6.

What you should be doing is this (where I'm dropping the _FILENO from the file descriptor names):

In ls: dup2(4, STDOUT); close each of file descriptors 3, 4, 5, 6.
In sort: dup2(3, STDIN); dup2(6, STDOUT); close each of file descriptors 3, 4, 5, 6.
In grep: dup2(5, STDIN); close each of file descriptors 3, 4, 5, 6.
In the parent, you should close all the file descriptors, and you do. Good!

Note the common theme: close all the pipe file descriptors!

What happens if you don't?

In ls: you've left file descriptor 4 (write end of the pipe from ls to sort) open; this doesn't matter much as ls does its thing and exits without further ado, closing this end of the pipe.
In sort: you've left file descriptors 3 (read end of the pipe from ls to sort) and 6 (write end of the pipe from sort to grep) open. File descriptor 3 will report EOF when ls exits. When sort completes, it will close 6. Since the write end of the pipe (both file descriptors 1 and 4 in ls, and 4 in sort) are closed, sort should get a clean EOF after reading the last of the output from ls. Note that sort reads all its input before generating any output.
In grep: you've left file descriptor 5 (read end of pipe from sort to grep open). In due course, when sort writes its data, grep will be able to read it. When sort completes, grep will get EOF on its standard input.

So, in this example, you've managed to make a pipeline that should work cleanly. However, in general, you should still be closing more file descriptors because it is otherwise easy to end up with an open write end of a pipe that prevents the programs from completing. For example, if grep had not closed 4, then sort would be waiting for input from grep and grep would be waiting for input from sort, and neither would budge until the other was complete — deadlock.

Corrigenda to statements in the question

In your description, you say:

For LS dup2(4 , STDOUT_FILENO) (instead of ouputing to STDIO, it goes to fd 4)

For SORT dup2(3 , STDIN_FILENO) and dup2(6, STDOUT_FILENO) (does not use std(in/out))

For GREP dup2(5 , STDIN_FILENO) (instead of reading STDIO, it reads from fd 5)

You've not described what happens correctly. The dup2(4, STDOUT) function ensures that standard output (file descriptor 1) points to the same open file description as file descriptor 4. (Read open() and dup2() very carefully to distinguish between open file descriptors and open file descriptions!) This means that when the child that becomes ls writes to standard output, it is writing to the write end of the first pipe, which means it goes to grep. The ls program continues as it always does, writing to standard output; it is just that standard output is the same as file descriptor 4.

Similar comments apply to each of the other statements. The sort reads from standard input and writes to standard output; the grep reads from standard input and writes to standard output. The dup2() calls ensure that these are references to the relevant pipes, that's all.

Note that the duplicated descriptors can be closed independently without affecting the other.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow