adh@mva.cs.liv.ac.uk (02/20/90)
Does anyone know how a parent process can determine the status of one of its children if it *hasn't* executed a wait ? It could arrange to catch a SIGCLD signal, but if the parent had several children it wouldn't know which one had sent it the SIGCLD ... would it ? My reason for asking is as follows: I need to write a program which starts several children and reads from their respective stdout's via pipes. The children are executing simultaneously, so the parent uses non-blocking reads, polling each pipe to see if anything has arrived. Unfortunately, a call to 'read' returns zero if the child hasn't sent any new data *OR* if the child has terminated so the parent cannot distinguish between EOF on a pipe and a pipe that temporarily has no data in it. Any advice and suggestions would be appreciated. Thanks David Harper University of Liverpool Computer Laboratory Liverpool, U.K. Preferred path for email replies: qq68@liverpool.ac.uk
les@chinet.chi.il.us (Leslie Mikesell) (02/22/90)
In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes: >Does anyone know how a parent process can determine the status of one >of its children if it *hasn't* executed a wait ? It could arrange to >catch a SIGCLD signal, but if the parent had several children it >wouldn't know which one had sent it the SIGCLD ... would it ? Just do a wait() inside the SIGCLD handler to pick up the PID of the exiting child. I think getting this right is unix-version specific. SysV pretends to queue the SIGCLD's but in fact only delivers pending SIGCLD's (after the first) in response to signal() being called to re-enable SIGCLD. Thus you must wait(), then signal(SIGCLD,handler) inside the handler. >My reason for asking is as follows: I need to write a program which >starts several children and reads from their respective stdout's via >pipes. The children are executing simultaneously, so the parent uses >non-blocking reads, polling each pipe to see if anything has arrived. >Unfortunately, a call to 'read' returns zero if the child hasn't >sent any new data *OR* if the child has terminated so the parent cannot >distinguish between EOF on a pipe and a pipe that temporarily has no >data in it. If the children are your own programs or you can stick another process in the middle, you could "packetize" the data to indicate the source and write it to a single pipe, allowing the reader to block instead of polling. Using a fixed-length header consisting of <pid><length> followed by a variable amount of data should work for most purposes. The header and following data must be written in a single write() and must be less than PIPE_MAX in length (generally 5 or 10K) to insure that the various writers keep their packet boundaries intact. A <length> field of 0 could indicate that the process is finished (i.e. EOF on its stream). The reader can either read the headers followed by a read() of the appropriate length for the data (which has to already be in the pipe since it was written in the same write() as the header), or more efficiently, attempt to read() in large chunks and parse out the results from the buffer. If you have FIFO's (named pipes) this arrangement can be set up without having a common parent. Les Mikesell les@chinet.chi.il.us
thomas@uplog.se (Thomas Tornblom) (02/22/90)
In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes: Does anyone know how a parent process can determine the status of one of its children if it *hasn't* executed a wait ? It could arrange to catch a SIGCLD signal, but if the parent had several children it wouldn't know which one had sent it the SIGCLD ... would it ? Yes it would. Arrange to catch SIGCLD and do a wait in the catcher: main() { . . signal(SIGCLD, reaper); . . } reaper() { int pid; int status; pid = wait(&status); /* pid is now the pid of the child that has died/changed status */ signal(SIGCLD, reaper); /* have to re-initialize in sysV */ } Hope this helps. Thomas -- Real life: Thomas Tornblom Email: thomas@uplog.se Snail mail: TeleLOGIC Uppsala AB Phone: +46 18 189406 Box 1218 Fax: +46 18 132039 S - 751 42 Uppsala, Sweden
ray@ctbilbo.UUCP (Ray Ward) (02/23/90)
In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes: >Does anyone know how a parent process can determine the status of one >of its children if it *hasn't* executed a wait ? It could arrange to >catch a SIGCLD signal, but if the parent had several children it >wouldn't know which one had sent it the SIGCLD ... would it ? > >My reason for asking is as follows: I need to write a program which >starts several children and reads from their respective stdout's via >pipes. The children are executing simultaneously, so the parent uses >non-blocking reads, polling each pipe to see if anything has arrived. >Unfortunately, a call to 'read' returns zero if the child hasn't >sent any new data *OR* if the child has terminated so the parent cannot >distinguish between EOF on a pipe and a pipe that temporarily has no >data in it. You have pointed out one of the areas that needs improvement for real time applications, among others. How does one get complete status information on a process, running or not, etc., without awkward programming contortions? There is no easy way. Some information can be obtained quickly. If you want to know if the pid is still valid ( which hopefully means the child process is still active ), use "kill( pid, 0 );". The "0" is the null signal. When kill is called with the null signal, only error checking is performed, and the error checking boils down to seeing if the pid is valid. To find out which child died, you must keep a record of the children's pids as they are forked. The fork call will return the child's pid to the parent. Keep these in an array or list with something that will identify the individual child (maybe only the array index). Set up a handler for SIGCLD. When SIGCLD is received, issue calls to kill() with the null signal and see how many children are dead. The reason for the list and the calls to kill() is that SIGCLD is a bit, not a queue. If two or more children were to die almost at once, before your signal handler was invoked by the kernel, you might miss one. And you don't want to call wait() twice if only one child has died. Update the list of child pids. Call wait(). Since at least one child has died, wait() will return immediately with the pid of the child that died and either the code the child passed to exit() or the number of the signal that killed it. Call wait() once for each child that has died since the last SIGCLD. The dead child is a zombie until you call wait() and takes up space in the kernel's process tables, so be sure to call wait(). If you have the luxury of time, the ps command can be issued to the shell with a system() call. A la: "system("ps -ef > uniquefile"); This file can be read and searched for the pid and the status information parsed. Why this isn't simply returned to the calling process in a process status structure I don't know. Good luck! -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Ray Ward Email: uunet!ctbilbo!ray Voice: (214) 991-8338x226, (800) 331-7032 Fax : (214) 991-8968 =-=-=-=- There _are_ simple answers, just no _easy_ ones. -- R.R. -=-=-=-=
gwyn@smoke.BRL.MIL (Doug Gwyn) (02/23/90)
In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes: >Does anyone know how a parent process can determine the status of one >of its children if it *hasn't* executed a wait ? Try using kill to send a "signal" number 0 to the process. It's supposed to report success if the process is still alive.
chris@mimsy.umd.edu (Chris Torek) (02/23/90)
In article <22@ctbilbo.UUCP> ray@ctbilbo.UUCP (Ray Ward) writes: >The reason for the list and the calls to kill() is that SIGCLD is >a bit, not a queue. Although SIGCLD is indeed a bit, and not a queue, despite documentation to the contrary in the many different and sometimes incompatible versions of System V [as opposed to the many different and sometimes incompatible versions of `BSD', in particular things derived from 4BSD that are not 4.2BSD, 4.3BSD, or 4.3BSD-tahoe, and which should not be called `BSD'].... Oops, seem to have lost the thread of that sentence. :-) Although SIGCLD is not queued, these calls to kill() remain unnecessary. SIGCLD is a special case hack. It operates very much unlike all other signals. The following code will work: signal_type /* either `int' or `void' depending on your version */ catchcld() /* achoo! */ { int status, w; w = wait(&status); <sort through list of processes; note that pid `w' died> signal(SIGCLD, catchcld); } This code, however, will *not* work, as it will recurse infinitely (until it runs out of stack space): signal_type catchcld() { int status, w; signal(SIGCLD, catchcld); w = wait(&status); <sort through list...> } The trick is that whenever SIGCLD is set to go to a user function (`catch-a-cold' above), if there are any child processes ready to be wait()ed for, System V Releases 1, 2, and 3 (at least) send a *new* SIGCLD signal. Thus, in the first example, for every exited child process, catchcld() gets called recursively just after the call to signal(). Under BSD, SIGCHLD (the moral equivalent of SIGCLD) acts just like any other signal, and you must loop in the signal handler to collect all exited children. This is why 4BSD has a `wait3' call (or `wait4' in 4.4BSD; wait3 is now a C library compatibility function). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
martin@mwtech.UUCP (Martin Weitzel) (02/24/90)
In article <22@ctbilbo.UUCP> ray@ctbilbo.UUCP (Ray Ward) writes: [many correct things about getting the status of a child process] > >If you have the luxury of time, the ps command can be issued to the >shell with a system() call. A la: "system("ps -ef > uniquefile"); Please, don't use "system" but "popen" and avoid the temporary file. (It seems to me, that "popen" is often overlooked in the manuals, because when learning C and UNIX, programmers do not see why it's useful, and later they do not look again). >This file can be read and searched for the pid and the status information >parsed. Why this isn't simply returned to the calling process in a >process status structure I don't know. Maybe in earlier days (when the source was available) UNIX programmers could "steel" the algorithms from the "ps" command and incorporate then directly into their code. Of course, this would only work for programs with access to /dev/mem (and public access to /dev/mem would be disastreous for system security). Really, a system call seems to be missing here ... -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
gwyn@smoke.BRL.MIL (Doug Gwyn) (02/25/90)
In article <22@ctbilbo.UUCP> ray@ctbilbo.UUCP (Ray Ward) writes:
-You have pointed out one of the areas that needs improvement for
-real time applications, among others. How does one get complete
-status information on a process, running or not, etc., without
-awkward programming contortions? There is no easy way.
Sure there is. You can access that information through the /proc
filesystem. If you don't have this, complain to your vendor.
ray@ctbilbo.UUCP (Ray Ward) (03/03/90)
In article <12229@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <22@ctbilbo.UUCP> ray@ctbilbo.UUCP (Ray Ward) writes: >-You have pointed out one of the areas that needs improvement for >-real time applications, among others. How does one get complete >-status information on a process, running or not, etc., without >-awkward programming contortions? There is no easy way. > >Sure there is. You can access that information through the /proc >filesystem. If you don't have this, complain to your vendor. You are entirely correct about the /proc filesystem. My thought, however, was that since the posting was to c.u.questions, and since the poster was obviously new to the more subtle details of handling child processes, a more basic and more generally applicable response was indicated. If you have SVR4 UNIX, which was released to the general public last fall, and have been into the /proc filesystem, and know how to use ioctl() to extract ps-type information (undocumented in both the SVR4 Migration Guide and my copy of the SVID89) or how to extract the information directly from the image of the process through the /proc filesystem, then you have a way to access process information directly. (Of course, with this level of knowledge, one might be posting answers in c.u.wizards instead of posting questions about the status of child processes in c.u.questions...) I have noticed from your other postings that you are very knowledgeable as well as helpful. Have you slipped into one of my habits -- giving a more technical response than is indicated -- or have I misjudged, on the low side, the technical experience of the poster? Or, indeed, of c.u.questions? -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Ray Ward Email: uunet!ctbilbo!ray Voice: (214) 991-8338x226, (800) 331-7032 Fax : (214) 991-8968 =-=-=-=- There _are_ simple answers, just no _easy_ ones. -- R.R. -=-=-=-=
guy@auspex.auspex.com (Guy Harris) (03/07/90)
>If you have SVR4 UNIX, which was released to the general public >last fall, and have been into the /proc filesystem, and know how >to use ioctl() to extract ps-type information (undocumented in >both the SVR4 Migration Guide and my copy of the SVID89) Not surprising, since: 1) I think the S5R4 Migration Guide is for the benefit of people migrating code from earlier S5 releases, SunOS, BSD, Xenix, etc. to S5R4, and most of those systems don't have "/proc"; 2) the SVID is for stuff AT&T decided to put into the SVID, and apparently they decided not to put "/proc" into the SVID. (On the other hand, it *does* have "ptrace()", sigh....) I suspect (and sincerely hope!) that "/proc" will be documented in the actual S5R4 manual pages.
mb@rex.cs.tulane.edu (Mark Benard) (03/07/90)
In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes: >...... The children are executing simultaneously, so the parent uses >non-blocking reads, polling each pipe to see if anything has arrived. >Unfortunately, a call to 'read' returns zero if the child hasn't >sent any new data *OR* if the child has terminated so the parent cannot >distinguish between EOF on a pipe and a pipe that temporarily has no >data in it. If you have BSD, you can use select instead of polling. Then do a blocking read on the selected channels, which will either have data or an EOF condition. -Mark -- Mark Benard Department of Computer Science INTERNET & BITNET: mb@cs.tulane.edu Tulane University USENET: rex!mb New Orleans, LA 70118
chris@mimsy.umd.edu (Chris Torek) (03/07/90)
>In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes: >>Unfortunately, a [nonblocking] call to 'read' returns zero if the child >>hasn't sent any new data *OR* [for EOF].... In article <2377@rex.cs.tulane.edu> mb@rex.cs.tulane.edu (Mark Benard) writes: >If you have BSD, you can use select instead of polling. If you have BSD, a non-blocking read() will return -1 with errno set to EWOULDBLOCK; an EOF returns 0 with errno unchanged. Anyway, the original question (about detecting exited children) has already been answered. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
gwyn@smoke.BRL.MIL (Doug Gwyn) (03/08/90)
In article <30@ctbilbo.UUCP> ray@ctbilbo.UUCP (Ray Ward) writes: >In article <12229@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >>Sure there is. You can access that information through the /proc >>filesystem. If you don't have this, complain to your vendor. >I have noticed from your other postings that you are very knowledgeable >as well as helpful. Have you slipped into one of my habits -- >giving a more technical response than is indicated -- or have I >misjudged, on the low side, the technical experience of the poster? >Or, indeed, of c.u.questions? Probably it just reflected my general frustration at how long it takes to get vendors (including UCB) to pick up good ideas and incorporate them in what they distribute. I still recall with amazement and some degree of anger how Larry Brown had never heard of Dennis Ritchie's work on "stackable line disciplines" that finally (perhaps I provided a small amount of initial impetus) got adopted by AT&T as "STREAMS". To their credit, AT&T (UNIX Operation, or whoever the developers are these days) has been much, much better about improving UNIX recently than they were just a few years ago. Another point, however, is that detailed examination and control of processes is simply not a novice topic. Until we get practically all vendors to get with the program, as opposed to going off and building incompatible UNIX-like variants, programmers simply don't have simple solutions at hand for hard problems. At best, they have to implement several solutions to cover the different flavors of UNIX (assuming they're concerned with porting their applications, as for the most part they should be). At worst, they cannot reasonably solve their technical problems on some platforms. What I think programmers could do to help is to get their computer acquisition staff to inform vendors that they will be specifying conformance to XPG3, SVID3, X3.159, and 1003.1, and by golly failure to provide ALL the specified services will result in no sale.