dcc@hpopd.pwd.hp.com (Daniel Creswell) (05/02/91)
I'm trying to suss how pipes between processes work (hey I dont get to play with UNIX all day so I'm a bit dumb!) and came across the following exmaple: The following example uses pipe to implement the command string "ls | sort": #include <sys/types.h> pid_t pid; int pipefd[2]; /* Assumes file descriptor 0 and 1 are open */ pipe (pipefd); if ((pid = fork()) == (pid_t)0) { close(1); /* close stdout */ dup (pipefd[1]); execlp ("ls", "ls", (char *)0); } else if (pid > (pid_t)0) { close(0); /* close stdin */ dup (pipefd[0]); execlp ("sort", "sort", (char *)0); } What I wanna know is how does this work. I've looked up 'dup' and discovered that it duplicates a descriptor which will be carried across an 'exec' but dont understand how the recipient program knows its got a pipe. Does it know? Is it simple that dup replaces stdin and stdout if so why? because it doesn't seem to do that in the above code? I know I'm missing something here...would some kind person please tell me what it is? Cheers, Dan. P.S. Could you post responses to notes it'll be easier than emailing me! Thanks for your ears...
subbarao@SunOS.Princeton.EDU (Kartik Subbarao) (05/05/91)
In article <37580001@hpopd.pwd.hp.com> dcc@hpopd.pwd.hp.com (Daniel Creswell) writes: >I'm trying to suss how pipes between processes work (hey I dont get to play >with UNIX all day so I'm a bit dumb!) and came across the following exmaple: > >The following example uses pipe to implement the command >string "ls | sort": > > #include <sys/types.h> > pid_t pid; > int pipefd[2]; > > /* Assumes file descriptor 0 and 1 are open */ > pipe (pipefd); > > if ((pid = fork()) == (pid_t)0) > { > close(1); /* close stdout */ > dup (pipefd[1]); > execlp ("ls", "ls", (char *)0); > } > else > if (pid > (pid_t)0) > { > close(0); /* close stdin */ > dup (pipefd[0]); > execlp ("sort", "sort", (char *)0); > } > >What I wanna know is how does this work. I've looked up 'dup' and discovered >that it duplicates a descriptor which will be carried across an 'exec' but >dont understand how the recipient program knows its got a pipe. Does it know? No, it doesn't know. Rather, it doesn't care. The typical unix pipe-tools just take standard input, perform some operation on it (sort in this case, and print their results to stdout. So, what's happening here is that you're dup'ing the read end of the pipe to sort's stdin, so when it reads from 0, it reads from the pipe. >Is it simple that dup replaces stdin and stdout if so why? because it doesn't >seem to do that in the above code? > >I know I'm missing something here...would some kind person please tell me what >it is? I hope you have a main() that's surrounding this code. Otherwise, I can see no problem. -Kartik -- internet# rm `df | tail +2 | awk '{ printf "%s/quotas\n",$6}'` subbarao@phoenix.Princeton.EDU -| Internet kartik@silvertone.Princeton.EDU (NeXT mail) SUBBARAO@PUCC.BITNET - Bitnet
mike@bria.UUCP (mike.stefanik) (05/05/91)
In an article, dcc@hpopd.pwd.hp.com (Daniel Creswell) writes: |I'm trying to suss how pipes between processes work (hey I dont get to play |with UNIX all day so I'm a bit dumb!) and came across the following exmaple: [ code example deleted ] |What I wanna know is how does this work. I've looked up 'dup' and discovered |that it duplicates a descriptor which will be carried across an 'exec' but |dont understand how the recipient program knows its got a pipe. Does it know? |Is it simple that dup replaces stdin and stdout if so why? because it doesn't |seem to do that in the above code? Since dup() will duplicate a descriptor *always using the first descriptor available* you are using (in my opinion: abusing) the feature of dup to replace stdout and stdin. Think of your file desciptor table as an array like this: +---+ | 0 | /dev/tty (standard input) +---+ | 1 | /dev/tty (standard output) +---+ | 2 | /dev/tty (standard error) +---+ | 3 | pipefd[0], pipe input +---+ | 4 | pipefd[1], pipe output +---+ When you close standard output with close(1), the table is like this: +---+ | 0 | /dev/tty (standard input) +---+ | 1 | +---+ | 2 | /dev/tty (standard error) +---+ | 3 | pipefd[0], pipe input +---+ | 4 | pipefd[1], pipe output +---+ When you subsequently call dup(), it starts walking forward through the table, using the first available slot that you have. So, after the dup(pipefd[1]) call, things look like this: +---+ | 0 | /dev/tty (standard input) +---+ +----> | 1 | pipe (standard output) | +---+ | | 2 | /dev/tty (standard error) | +---+ | | 3 | pipefd[0], pipe input | +---+ +----> | 4 | pipefd[1], pipe output +---+ When you then execl() the program, it will start writing to descriptor 1, which is now the pipe. When your program forked (inheriting the pipe descriptors) it did a similar thing, replacing it's standard input with the input on the pipe -- thus it could read what was being written. -- Michael Stefanik, MGI Inc, Los Angeles | Opinions stated are never realistic Title of the week: Systems Engineer | UUCP: ...!uunet!bria!mike ------------------------------------------------------------------------------- If MS-DOS didn't exist, who would UNIX programmers have to make fun of?
dcc@hpopd.pwd.hp.com (Daniel Creswell) (05/09/91)
Thanks for your input... I should point out that this code was 'lifted' from the manual entry explaining 'pipe'. I couldn't understand how it worked hence the reason why I posted it. I wouldn't even consider doing it this way I just wanted to know how it worked... Thanks, Dan.
dougy@hpsciz.sc.hp.com (Doug Yip) (05/11/91)
>What I wanna know is how does this work. I've looked up 'dup' and discovered >that it duplicates a descriptor which will be carried across an 'exec' but >dont understand how the recipient program knows its got a pipe. Does it know? >Is it simple that dup replaces stdin and stdout if so why? because it doesn't >seem to do that in the above code? >I know I'm missing something here...would some kind person please tell me what >it is? To understand how your sample program works, you need to first understand how fork() ,pipe() and dup() works. When fork() is invoked, UNIX kernel will "clone" the parent process to create the child process. I am not going to bore you with the details of cloning. But an important data structure that will be copied to the child process is the file descriptor table. Each process in UNIX has a file descriptor table associated with it. In UNIX V, file descriptor table can have up to 25 entries. In BSD UNIX, file descriptor table can have up to 99 entries. Therefore, in UNIX V, file descriptor table entries can be a "precious" commodities and you must use it carefully in order to avoid running out of file descriptors. This cloning will allow all the file descriptor table entries established in the parent process to be inherited by the child process. When the parent process invoke pipe() and then fork(), the child process will also inherit the parent process's file descriptors returned from the pipe() system call. One misconception about pipe() is that it allows bidirectional communication between two processes. That is simply untrue. One process must be a "reader" and the other must be a "writer" where both communicating thru the pair of file descriptors returned from the pipe() system call. Say you invoke pipe with the parameter pfd where pfd is declared as an integer array of 2 elements. The writer process will write to pfd[1] while the reader process will read from pfd[0]. Now, you may ask what happen to the pfd[0] belong to the writer. Well, it is simply not used and usually will be released thru close(). Ditto for the pfd[1] belongs to the reader. If you want to allow bidirectional communication, you need to use another pair of file descriptors returned from a second pipe() system call. But, who wants to do that? sockets has a much better implementation of bidirectional communication between processes. That's another topic. Each process will typically use up the first 3 entries of the file descriptor table; 0 for stdin, 1 for stdout and 2 for stderr.If you can manuiplate these entries, you can set up a communication channel between the parent and the child process. That's is how the dup() system call come in. The dup() system call duplicates the file descriptor and returns to you another file descriptor that points to the same file. It is very similiar to opening the same file twice. However, there is a twist to it. Dup() will return the lowest file descriptor number back to the caller that is NOT used. If you close the stdin, UNIX kernel will free up the file descriptor entry 0. By doing a dup() right after a close(0), you can be sure that the file descriptor returned by dup will be 0. (UNIX Version 7 has a dup2() system call which takes care both close() and dup() in one system call.) A revised copy of your sample program with comments is listed below to show you "the proper way" of linking two processes to communicate with each other. I am not going to get into error checking and all that jazz. I assume you know how to do that. #include <sys/types.h> main() { pid_t pid; int pipefd[2]; /* invoke pipe() to get file descriptors */ pipe (pipefd); if ((pid = fork()) == (pid_t)0) { /* I am a child process */ /* close stdout to free up file descriptor 1 */ close(1); /* invoke dup() to associate file descriptor 1 (stdout) */ /* with the write end of the pipe. */ dup (pipefd[1]); /* free up the rest of unnecessary file descriptors */ close(pipefd[0]); close(pipefd[1]); close(0); execlp ("ls", "ls", (char *)0); } else if (pid > (pid_t)0) { /* I am the parent */ /* close stdin to free up file descriptor 0 */ close(0); /* invoke dup() to associate file descriptor 0 */ /* with the read end of the pipe */ dup (pipefd[0]); /* close all the unnecessary file descriptors */ /* you don't want to close stdout & stderr.*/ /* Otherwise, the output has nowhere to go */ close(pipefd[0]); close(pipefd[1]); execlp ("sort", "sort", (char *)0); } } An excellent book for you to get started with UNIX system programming is "Advanced UNIX Programming" by Marc. J. Rochkind. The publisher is Prentice-Hall. It emphasizes more on the standard UNIX System V stuff. I hope I have answered most of your questions. P.S. Anyone out there know a good BSD system programming book other than the "Design & Implemntation of 4.3 BSD UNIX" ? ------------------------------------------------------------------------------ Doug Yip Hewlett Packard - Manufacturing Productivity Operation Santa Clara, CA Internet: dougy@hpsciz.hp.com (408) 553-3622 Mailstop: 51U-91 ------------------------------------------------------------------------------
guy@auspex.auspex.com (Guy Harris) (05/19/91)
>In UNIX V, file descriptor table can have up to 25 entries. Actually, that depends on what *flavor* of System V you have. It's a configurable parameter in S5R3, or at least in the 3B2 reference port of S5R3.1. I think it's also configurable in S5R4, and, in addition, S5R4 has separate "soft" and "hard" limits for the size of the file descriptor table (any resemblance between that notion of "soft" and "hard" limits, and the notion of "soft" and "hard" resource limits in BSD, is purely intentional). The default value of the parameter in S5R3.1 appears to be 20, not 25. It appears that you can crank it up to 100 or so. S5R4 will, hopefully, choose SunOS 4.1-ish default values, i.e. 64 for the soft limit and 256 for the hard limit. >In BSD UNIX, file descriptor table can have up to 99 entries. In 4.3BSD, 4.3-tahoe, and 4.3-reno, as distributed by Berkeley, it's 64, not 99. It can be cranked up, if you have source and don't mind recompiling the kernel as well as any programs that depend on the layout of the U area. >(UNIX Version 7 has a dup2() system call which takes care both close() and >dup() in one system call.) It's also in POSIX, and S5R3 has an implementation of it atop "fcntl(F_DUPFD)".