[comp.unix.questions] pipes in unix

dcc@hpopd.pwd.hp.com (Daniel Creswell) (05/02/91)

I'm trying to suss how pipes between processes work (hey I dont get to play
with UNIX all day so I'm a bit dumb!) and came across the following exmaple:

The following example uses pipe to implement the command
string "ls | sort":

     #include <sys/types.h>
     pid_t pid;
     int pipefd[2];

    /*  Assumes file descriptor 0 and 1 are open  */
              pipe (pipefd);

	      if ((pid = fork()) == (pid_t)0) 
	      {
              close(1);      /* close stdout */
              dup (pipefd[1]);
	      execlp ("ls", "ls", (char *)0);
	      }
	      else
	      if (pid > (pid_t)0) 
	      {
	      close(0); /* close stdin  */
              dup (pipefd[0]);
	      execlp ("sort", "sort", (char *)0);
	      }

What I wanna know is how does this work. I've looked up 'dup' and discovered
that it duplicates a descriptor which will be carried across an 'exec' but
dont understand how the recipient program knows its got a pipe. Does it know?
Is it simple that dup replaces stdin and stdout if so why? because it doesn't
seem to do that in the above code?

I know I'm missing something here...would some kind person please tell me what
it is?

Cheers,
	Dan.

P.S. Could you post responses to notes it'll be easier than emailing me!

Thanks for your ears...

subbarao@SunOS.Princeton.EDU (Kartik Subbarao) (05/05/91)

In article <37580001@hpopd.pwd.hp.com> dcc@hpopd.pwd.hp.com (Daniel Creswell) writes:
>I'm trying to suss how pipes between processes work (hey I dont get to play
>with UNIX all day so I'm a bit dumb!) and came across the following exmaple:
>
>The following example uses pipe to implement the command
>string "ls | sort":
>
>     #include <sys/types.h>
>     pid_t pid;
>     int pipefd[2];
>
>    /*  Assumes file descriptor 0 and 1 are open  */
>              pipe (pipefd);
>
>	      if ((pid = fork()) == (pid_t)0) 
>	      {
>              close(1);      /* close stdout */
>              dup (pipefd[1]);
>	      execlp ("ls", "ls", (char *)0);
>	      }
>	      else
>	      if (pid > (pid_t)0) 
>	      {
>	      close(0); /* close stdin  */
>              dup (pipefd[0]);
>	      execlp ("sort", "sort", (char *)0);
>	      }
>
>What I wanna know is how does this work. I've looked up 'dup' and discovered
>that it duplicates a descriptor which will be carried across an 'exec' but
>dont understand how the recipient program knows its got a pipe. Does it know?

No, it doesn't know. Rather, it doesn't care. The typical unix pipe-tools
just take standard input, perform some operation on it (sort in this case,
and print their results to stdout. So, what's happening here is that you're
dup'ing the read end of the pipe to sort's stdin, so when it reads from 0,
it reads from the pipe.

>Is it simple that dup replaces stdin and stdout if so why? because it doesn't
>seem to do that in the above code?
>
>I know I'm missing something here...would some kind person please tell me what
>it is?

I hope you have a main() that's surrounding this code. Otherwise, I can see
no problem.


			-Kartik

--
internet# rm `df | tail +2 | awk '{ printf "%s/quotas\n",$6}'`

subbarao@phoenix.Princeton.EDU -| Internet
kartik@silvertone.Princeton.EDU (NeXT mail)  
SUBBARAO@PUCC.BITNET			          - Bitnet

mike@bria.UUCP (mike.stefanik) (05/05/91)

In an article, dcc@hpopd.pwd.hp.com (Daniel Creswell) writes:
|I'm trying to suss how pipes between processes work (hey I dont get to play
|with UNIX all day so I'm a bit dumb!) and came across the following exmaple:

[ code example deleted ]

|What I wanna know is how does this work. I've looked up 'dup' and discovered
|that it duplicates a descriptor which will be carried across an 'exec' but
|dont understand how the recipient program knows its got a pipe. Does it know?
|Is it simple that dup replaces stdin and stdout if so why? because it doesn't
|seem to do that in the above code?

Since dup() will duplicate a descriptor *always using the first descriptor
available* you are using (in my opinion: abusing) the feature of dup to
replace stdout and stdin.

Think of your file desciptor table as an array like this:

        +---+
        | 0 |	/dev/tty (standard input)
        +---+
        | 1 |	/dev/tty (standard output)
        +---+
        | 2 |	/dev/tty (standard error)
        +---+
        | 3 |	pipefd[0], pipe input
        +---+
        | 4 |	pipefd[1], pipe output
        +---+

When you close standard output with close(1), the table is like this:

        +---+
        | 0 |	/dev/tty (standard input)
        +---+
        | 1 |
        +---+
        | 2 |	/dev/tty (standard error)
        +---+
        | 3 |	pipefd[0], pipe input
        +---+
        | 4 |	pipefd[1], pipe output
        +---+

When you subsequently call dup(), it starts walking forward through
the table, using the first available slot that you have.  So, after
the dup(pipefd[1]) call, things look like this:

        +---+
        | 0 |	/dev/tty (standard input)
        +---+
 +----> | 1 |   pipe (standard output)
 |      +---+
 |      | 2 |	/dev/tty (standard error)
 |      +---+
 |      | 3 |	pipefd[0], pipe input
 |      +---+
 +---->	| 4 |	pipefd[1], pipe output
        +---+

When you then execl() the program, it will start writing to descriptor 1,
which is now the pipe.  When your program forked (inheriting the pipe
descriptors) it did a similar thing, replacing it's standard input with the
input on the pipe -- thus it could read what was being written.

-- 
Michael Stefanik, MGI Inc, Los Angeles | Opinions stated are never realistic
Title of the week: Systems Engineer    | UUCP: ...!uunet!bria!mike
-------------------------------------------------------------------------------
If MS-DOS didn't exist, who would UNIX programmers have to make fun of?

dcc@hpopd.pwd.hp.com (Daniel Creswell) (05/09/91)

Thanks for your input... I should point out that this code was 'lifted' from
the manual entry explaining 'pipe'. I couldn't understand how it worked hence
the reason why I posted it. I wouldn't even consider doing it this way I just
wanted to know how it worked...

Thanks,
	Dan.

dougy@hpsciz.sc.hp.com (Doug Yip) (05/11/91)

>What I wanna know is how does this work. I've looked up 'dup' and discovered
>that it duplicates a descriptor which will be carried across an 'exec' but
>dont understand how the recipient program knows its got a pipe. Does it know?
>Is it simple that dup replaces stdin and stdout if so why? because it doesn't
>seem to do that in the above code?

>I know I'm missing something here...would some kind person please tell me what
>it is?


To understand how your sample program works, you need to first understand
how fork() ,pipe() and dup() works.

When fork() is invoked, UNIX kernel will "clone" the parent process to create
the child process. I am not going to bore you with the details of cloning.
But an important data structure that will be copied to the child process is
the file descriptor table. Each process in UNIX has a file descriptor table
associated with it. In UNIX V, file descriptor table can have up to 25 entries.
In BSD UNIX, file descriptor table can have up to 99 entries. Therefore, in UNIX
V, file descriptor table entries can be a "precious" commodities and you must
use it carefully in order to avoid running out of file descriptors.

This cloning will allow all the file descriptor table entries established in the
parent process to be inherited by the child process. When the parent process
invoke pipe() and then fork(), the child process will also inherit the parent
process's file descriptors returned from the pipe() system call. One
misconception about pipe() is that it allows bidirectional communication between
two processes. That is simply untrue. One process must be a "reader" and the
other must be a "writer" where both communicating thru the pair of file
descriptors returned from the pipe() system call. Say you invoke pipe with the
parameter pfd where pfd is declared as an integer array of 2 elements. The
writer process will write to pfd[1] while the reader process will read from
pfd[0]. Now, you may ask what happen to the pfd[0] belong to the writer. Well,
it is simply not used and usually will be released thru close(). Ditto for the
pfd[1] belongs to the reader. If you want to allow bidirectional communication,
you need to use another pair of file descriptors returned from a second pipe()
system call. But, who wants to do that? sockets has a much better implementation
of bidirectional communication between processes. That's another topic.

Each process will typically use up the first 3 entries of the file descriptor
table; 0 for stdin, 1 for stdout and 2 for stderr.If you can manuiplate these
entries, you can set up a communication channel between the parent and the
child process. That's is how the dup() system call come in. The dup() system
call duplicates the file descriptor and returns to you another file descriptor
that points to the same file. It is very similiar to opening the same file
twice. However, there is a twist to it. Dup() will return the lowest file
descriptor number back to the caller that is NOT used. If you close the stdin,
UNIX kernel will free up the file descriptor entry 0. By doing a dup() right
after a close(0), you can be sure that the file descriptor returned by dup will
be 0. (UNIX Version 7 has a dup2() system call which takes care both close() and
dup() in one system call.)


A revised copy of your sample program with comments is listed below to show you
"the proper way" of linking two processes to communicate with each other.
I am not going to get into error checking and all that jazz. I assume you 
know how to do that.


#include <sys/types.h>
main()
{
     pid_t pid;
     int pipefd[2];

              /* invoke pipe() to get file descriptors */

              pipe (pipefd);

	      if ((pid = fork()) == (pid_t)0) 
	      {
              /* I am a child process */
              /* close stdout to free up file descriptor 1 */

              close(1);      

              /* invoke dup() to associate file descriptor 1 (stdout) */
              /* with the write end of the pipe.                      */

              dup (pipefd[1]);
 
              /* free up the rest of unnecessary file descriptors */

              close(pipefd[0]);
              close(pipefd[1]);
              close(0);

	      execlp ("ls", "ls", (char *)0);
	      }
	      else
	      if (pid > (pid_t)0) 
	      {
              /* I am the parent */
              /* close stdin to free up file descriptor 0 */

	      close(0); 
         
              /* invoke dup() to associate file descriptor 0 */
              /* with the read end of the pipe */
          
              dup (pipefd[0]);
      
              /* close all the unnecessary file descriptors */
              /* you don't want to close stdout & stderr.*/
              /* Otherwise, the output has nowhere to go */
                 
              close(pipefd[0]);
              close(pipefd[1]);

	      execlp ("sort", "sort", (char *)0);
	      }
}

An excellent book for you to get started with UNIX system programming is
"Advanced UNIX Programming" by Marc. J. Rochkind. The publisher is
Prentice-Hall. It emphasizes more on the standard UNIX System V stuff.

I hope I have answered most of your questions.


P.S. Anyone out there know a good BSD system programming book other than the
"Design & Implemntation of 4.3 BSD UNIX" ?



------------------------------------------------------------------------------
 Doug Yip             Hewlett Packard - Manufacturing Productivity Operation
 Santa Clara, CA      Internet: dougy@hpsciz.hp.com
 (408) 553-3622  
 Mailstop: 51U-91
------------------------------------------------------------------------------

guy@auspex.auspex.com (Guy Harris) (05/19/91)

>In UNIX V, file descriptor table can have up to 25 entries.

Actually, that depends on what *flavor* of System V you have.  It's a
configurable parameter in S5R3, or at least in the 3B2 reference port of
S5R3.1.  I think it's also configurable in S5R4, and, in addition, S5R4
has separate "soft" and "hard" limits for the size of the file
descriptor table (any resemblance between that notion of "soft" and
"hard" limits, and the notion of "soft" and "hard" resource limits in
BSD, is purely intentional).

The default value of the parameter in S5R3.1 appears to be 20, not 25. 
It appears that you can crank it up to 100 or so.

S5R4 will, hopefully, choose SunOS 4.1-ish default values, i.e. 64 for
the soft limit and 256 for the hard limit.

>In BSD UNIX, file descriptor table can have up to 99 entries.

In 4.3BSD, 4.3-tahoe, and 4.3-reno, as distributed by Berkeley, it's 64,
not 99.  It can be cranked up, if you have source and don't mind
recompiling the kernel as well as any programs that depend on the layout
of the U area.

>(UNIX Version 7 has a dup2() system call which takes care both close() and
>dup() in one system call.)

It's also in POSIX, and S5R3 has an implementation of it atop
"fcntl(F_DUPFD)".