What is the exact ingenuity of Unix pipe

https://softwareengineering.stackexchange.com/questions/304878

09-12-2020
|

문제

I have heard the story of how Douglas Mcllroy came up with the concept and how Ken Thompson implemented it in one night.

As far as I understand, pipe is a system call which shares a piece of memory between two processes where one process writes and other reads from.

As someone who is not a familiar with OS internals or concepts, I was wondering what exactly is the "genius" in the story? Is it the idea of two processes sharing memory? Or is it the implementation? Or both?

PS: I am aware of the utility of the pipe or how to use it in shell. The question is about concept and implementation of the |

해결책

As far as I understand, pipe is a system call which shares a piece of memory between two processes where one process writes and other reads from.

Actually, there is no shared memory involved. The reader and writer are NOT sharing any part of their address space, and they are not using any explicit synchronization.

The reading and writing processes are making read and write system calls exactly as they would if they were reading from / writing to a file. THAT is the genius... the innovation: the notion that (simple) interprocess communication and file I/O can be handled the same way... from the perspective of the application programmer and the user.

Once the pipe has been set up, the OS (not application code, or libraries in user-space) takes care of the buffering and the coordination. Transparently.

By contrast, before the invention of the pipe concept, if you needed to do "pipeline" processing, you would typically have one application write output to a file, and then when it is finished, you would run the second application to read from the file.

Alternatively, if you wanted a true pipeline you could code both applications to set up a (real) shared memory segment and use semaphores (or something) to coordinate the reading / writing. Complicated... and as a consequence not often done.

다른 팁

In my opinion, the genius of the idea of "pipes" is the simplicity of use.

You don't have to make any system calls, allocate memory, nothing complicated at all. In the shell, you use a single character: |. This gives extraordinary power in the combination of simple (or complex) tools to a given task.

Take some common everyday tasks like sorting text neatly. You may have a command that lists a whole bunch of names. (For my example I'll use a file that contains a bunch of names, courtesy of listofrandomnames.com.) Using pipes you can do something like the following:

$ cat names.txt
Sally Weikel
Dana Penaflor
Christine Hook
Shaneka Flythe
Almeda Crook
Freddie Lindley
Hester Kersh
Wanda Ruse
Megan Mauzy
Samuel Mancha
Paris Phipps
Annika Accardo
Elena Nabors
Caroline Foti
Jude Nesby
Chase Gordy
Carmela Driggers
Marlin Ostendorf
Harrison Dauber
$ cat names.txt | awk '{print $2 ", " $1}' | sort | uniq | column -c 100
Accardo, Annika     Hook, Christine     Ostendorf, Marlin
Crook, Almeda       Kersh, Hester       Penaflor, Dana
Dauber, Harrison    Lindley, Freddie    Phipps, Paris
Driggers, Carmela   Mancha, Samuel      Ruse, Wanda
Flythe, Shaneka     Mauzy, Megan        Weikel, Sally
Foti, Caroline      Nabors, Elena
Gordy, Chase        Nesby, Jude

This is just one example; there are thousands. For a few other specific tasks that are made remarkably easier by use of pipes, see the section "The Unix Philosophy" on this page.

To underscore this answer, see slides 4 through 9 of the presentation, "Why Zsh is Cooler than Your Shell."

_{I am aware that the above command includes a UUOC. I let it stand because it is a placeholder for an arbitrary command that generates text.}

So I tried to do a bit of research on this by looking for PDP-10 / TOPS-10 manuals in order to find out what the state of the art was before pipes. I found this, but TOPS-10 is remarkably hard to google. There are a few good references on the invention of the pipe: an interview with McIlroy, on the history and impact of UNIX.

You have to put this into historical context. Few of the modern tools and conveniences we take for granted existed.

"At the start, Thompson did not even program on the PDP itself, but instead used a set of macros for the GEMAP assembler on a GE-635 machine."(29) A paper tape was generated on the GE 635 and then tested on the PDP-7 until, according to Ritchie, "a primitive Unix kernel, an editor, an assembler, a simple shell (command interpreter), and a few utilities (like the Unix rm, cat, cp commands) were completed. At this point, the operating system was self-supporting, programs could be written and tested without resort to paper tape, and development continued on the PDP-7 itself."

A PDP-7 looks like this. Note the lack of an interactive display or hard disk. The "filesystem" would be stored on the magnetic tape. There was up to 64kB of memory for programs and data.

In that environment, programmers tended to address the hardware directly, such as by issuing commands to spin up the tape and process characters one at a time read directly from the tape interface. UNIX provided abstractions over this, so that rather than "read from teletype" and "read from tape" being separate interfaces they were combined into one, with the crucial pipe addition of "read from output of other program without storing a temporary copy on disk or tape".

Here is McIlroy on the invention of grep. I think this does a good job of summing up the amount of work required in the pre-UNIX environment.

"Grep was invented for me. I was making a program to read text aloud through a voice synthesizer. As I invented phonetic rules I would check Webster's dictionary for words on which they might fail. For example, how do you cope with the digraph 'ui', which is pronounced many different ways: 'fruit', 'guile', 'guilty', 'anguish', 'intuit', 'beguine'? I would break the dictionary up into pieces that fit in ed's limited buffer and use a global command to select a list. I would whittle this list down by repeated scannings with ed to see how each proposed rule worked."

"The process was tedious, and terribly wasteful, since the dictionary had to be split (one couldn't afford to leave a split copy on line). Then ed copied each part into /tmp, scanned it twice to accomplish the g command, and finally threw it away, which takes time too."

"One afternoon I asked Ken Thompson if he could lift the regular expression recognizer out of the editor and make a one-pass program to do it. He said yes. The next morning I found a note in my mail announcing a program named grep. It worked like a charm. When asked what that funny name meant, Ken said it was obvious. It stood for the editor command that it simulated, g/re/p (global regular expression print)."

Compare the first part of that to the cat names.txt | awk '{print $2 ", " $1}' | sort | uniq | column -c 100 example. If your options are "build a command line" versus "write a program specifically for the purpose, by hand, in assembler", then it's worth building the command line. Even if it takes a few hours of reading the (paper) manuals to do it. You can then write it down for future reference.

The genius of Pipes is that it combines three important ideas.

First, pipes are a practical implementation of 'co-routines', a term coined by Conway in 1958 which was promising but saw little practical use before Pipes.

Secondly, by implementing pipes in the shell language, Thompson et al invented the first real 'glue language'.

These two points allow reusable software components to be developed efficiently in a low-level, optimized language, and then glued together to form much larger, more complex functionality. They called this 'Programming in the Large'.

Thirdly, implementing pipes using the same system calls that were used for file access allowed programs to be written with universal interfaces. This allowed for truly universal solutions to software problems, that can be used interactively, using data from files, and as part of larger software systems, all without a single change to the software components. No compiling, no configuration, just a few simple shell commands.

If you care to go through the learning curve, the UNIX software is just as useful today as it was 40 years ago. We are constantly re-inventing things they already knew and built solutions for. And the key breakthrough was the simple Pipe. The only real innovation after that was the creation of the internet in the 80's. Dramatically, UNIX botched its implementation of that by creating a separate API. We still suffer the consequences... Oh, yeah, there was something with video displays and mice that became popular in the late 80's. But that is for WIMPs.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 softwareengineering.stackexchange