[comp.unix.questions] Can a parent process determine its child's status ?

adh@mva.cs.liv.ac.uk (02/20/90)

Does anyone know how a parent process can determine the status of one
of its children if it *hasn't* executed a wait ? It could arrange to 
catch a SIGCLD signal, but if the parent had several children it
wouldn't know which one had sent it the SIGCLD ... would it ?

My reason for asking is as follows: I need to write a program which
starts several children and reads from their respective stdout's via
pipes. The children are executing simultaneously, so the parent uses
non-blocking reads, polling each pipe to see if anything has arrived.
Unfortunately, a call to 'read' returns zero if the child hasn't
sent any new data *OR* if the child has terminated so the parent cannot
distinguish between EOF on a pipe and a pipe that temporarily has no
data in it.

Any advice and suggestions would be appreciated. 

Thanks

  David Harper
  University of Liverpool Computer Laboratory
  Liverpool, U.K.

  Preferred path for email replies: qq68@liverpool.ac.uk

les@chinet.chi.il.us (Leslie Mikesell) (02/22/90)

In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes:

>Does anyone know how a parent process can determine the status of one
>of its children if it *hasn't* executed a wait ? It could arrange to 
>catch a SIGCLD signal, but if the parent had several children it
>wouldn't know which one had sent it the SIGCLD ... would it ?

Just do a wait() inside the SIGCLD handler to pick up the PID of
the exiting child.  I think getting this right is unix-version specific.
SysV pretends to queue the SIGCLD's but in fact only delivers pending
SIGCLD's (after the first) in response to signal() being called to
re-enable SIGCLD.  Thus you must wait(), then signal(SIGCLD,handler)
inside the handler.

>My reason for asking is as follows: I need to write a program which
>starts several children and reads from their respective stdout's via
>pipes. The children are executing simultaneously, so the parent uses
>non-blocking reads, polling each pipe to see if anything has arrived.
>Unfortunately, a call to 'read' returns zero if the child hasn't
>sent any new data *OR* if the child has terminated so the parent cannot
>distinguish between EOF on a pipe and a pipe that temporarily has no
>data in it.

If the children are your own programs or you can stick another process
in the middle, you could "packetize" the data to indicate the source
and write it to a single pipe, allowing the reader to block instead
of polling.  Using a fixed-length header consisting of <pid><length>
followed by a variable amount of data should work for most purposes.
The header and following data must be written in a single write() and
must be less than PIPE_MAX in length (generally 5 or 10K) to insure
that the various writers keep their packet boundaries intact.  A <length>
field of 0 could indicate that the process is finished (i.e. EOF on its
stream).  The reader can either read the headers followed by a read()
of the appropriate length for the data (which has to already be in the
pipe since it was written in the same write() as the header), or
more efficiently, attempt to read() in large chunks and parse out the
results from the buffer.
If you have FIFO's (named pipes) this arrangement can be set up without
having a common parent.

Les Mikesell
 les@chinet.chi.il.us

thomas@uplog.se (Thomas Tornblom) (02/22/90)

In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes:

   Does anyone know how a parent process can determine the status of one
   of its children if it *hasn't* executed a wait ? It could arrange to 
   catch a SIGCLD signal, but if the parent had several children it
   wouldn't know which one had sent it the SIGCLD ... would it ?


Yes it would.
Arrange to catch SIGCLD and do a wait in the catcher:

main()
{
	.
	.
	signal(SIGCLD, reaper);
	.
	.
}

reaper()
{
	int pid;
	int status;

	pid = wait(&status);

	/* pid is now the pid of the child that has died/changed status */

	signal(SIGCLD, reaper);		/* have to re-initialize in sysV */
}

Hope this helps.
Thomas
-- 
Real life:	Thomas Tornblom		Email:	thomas@uplog.se
Snail mail:	TeleLOGIC Uppsala AB		Phone:	+46 18 189406
		Box 1218			Fax:	+46 18 132039
		S - 751 42 Uppsala, Sweden

ray@ctbilbo.UUCP (Ray Ward) (02/23/90)

In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes:
>Does anyone know how a parent process can determine the status of one
>of its children if it *hasn't* executed a wait ? It could arrange to 
>catch a SIGCLD signal, but if the parent had several children it
>wouldn't know which one had sent it the SIGCLD ... would it ?
>
>My reason for asking is as follows: I need to write a program which
>starts several children and reads from their respective stdout's via
>pipes. The children are executing simultaneously, so the parent uses
>non-blocking reads, polling each pipe to see if anything has arrived.
>Unfortunately, a call to 'read' returns zero if the child hasn't
>sent any new data *OR* if the child has terminated so the parent cannot
>distinguish between EOF on a pipe and a pipe that temporarily has no
>data in it.

You have pointed out one of the areas that needs improvement for
real time applications, among others.  How does one get complete
status information on a process, running or not, etc., without
awkward programming contortions?  There is no easy way.

Some information can be obtained quickly.  If you want to know if
the pid is still valid ( which hopefully means the child process is
still active ), use "kill( pid, 0 );".  The "0" is the null signal.
When kill is called with the null signal, only error checking is
performed, and the error checking boils down to seeing if the pid
is valid.

To find out which child died, you must keep a record of the children's
pids as they are forked.  The fork call will return the child's pid
to the parent.  Keep these in an array or list with something that
will identify the individual child (maybe only the array index).
Set up a handler for SIGCLD.  When SIGCLD is received, issue calls
to kill() with the null signal and see how many children are dead.

The reason for the list and the calls to kill() is that SIGCLD is
a bit, not a queue.  If two or more children were to die almost at
once, before your signal handler was invoked by the kernel, you might
miss one.  And you don't want to call wait() twice if only one child
has died.  Update the list of child pids.

Call wait().  Since at least one child has died, wait() will return
immediately with the pid of the child that died and either the code
the child passed to exit() or the number of the signal that killed it.
Call wait() once for each child that has died since the last SIGCLD.
The dead child is a zombie until you call wait() and takes up space
in the kernel's process tables, so be sure to call wait().

If you have the luxury of time, the ps command can be issued to the
shell with a system() call.  A la:  "system("ps -ef > uniquefile");
This file can be read and searched for the pid and the status information
parsed.  Why this isn't simply returned to the calling process in a 
process status structure I don't know.  

Good luck!

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Ray Ward                                          Email:  uunet!ctbilbo!ray  
Voice:  (214) 991-8338x226, (800) 331-7032        Fax  :  (214) 991-8968     
=-=-=-=-  There _are_ simple answers, just no _easy_ ones. -- R.R. -=-=-=-=

gwyn@smoke.BRL.MIL (Doug Gwyn) (02/23/90)

In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes:
>Does anyone know how a parent process can determine the status of one
>of its children if it *hasn't* executed a wait ?

Try using kill to send a "signal" number 0 to the process.
It's supposed to report success if the process is still alive.

chris@mimsy.umd.edu (Chris Torek) (02/23/90)

In article <22@ctbilbo.UUCP> ray@ctbilbo.UUCP (Ray Ward) writes:
>The reason for the list and the calls to kill() is that SIGCLD is
>a bit, not a queue.

Although SIGCLD is indeed a bit, and not a queue, despite documentation
to the contrary in the many different and sometimes incompatible
versions of System V [as opposed to the many different and sometimes
incompatible versions of `BSD', in particular things derived from 4BSD
that are not 4.2BSD, 4.3BSD, or 4.3BSD-tahoe, and which should not be
called `BSD']....  Oops, seem to have lost the thread of that sentence.
:-)  Although SIGCLD is not queued, these calls to kill() remain
unnecessary.  SIGCLD is a special case hack.  It operates very much
unlike all other signals.

The following code will work:

	signal_type	/* either `int' or `void' depending on your version */
	catchcld()	/* achoo! */
	{
		int status, w;

		w = wait(&status);
		<sort through list of processes; note that pid `w' died>
		signal(SIGCLD, catchcld);
	}

This code, however, will *not* work, as it will recurse infinitely
(until it runs out of stack space):

	signal_type catchcld() {
		int status, w;

		signal(SIGCLD, catchcld);
		w = wait(&status);
		<sort through list...>
	}

The trick is that whenever SIGCLD is set to go to a user function
(`catch-a-cold' above), if there are any child processes ready to
be wait()ed for, System V Releases 1, 2, and 3 (at least) send a
*new* SIGCLD signal.  Thus, in the first example, for every exited
child process, catchcld() gets called recursively just after the
call to signal().

Under BSD, SIGCHLD (the moral equivalent of SIGCLD) acts just like any
other signal, and you must loop in the signal handler to collect all
exited children.  This is why 4BSD has a `wait3' call (or `wait4' in
4.4BSD; wait3 is now a C library compatibility function).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

martin@mwtech.UUCP (Martin Weitzel) (02/24/90)

In article <22@ctbilbo.UUCP> ray@ctbilbo.UUCP (Ray Ward) writes:
[many correct things about getting the status of a child process]
>
>If you have the luxury of time, the ps command can be issued to the
>shell with a system() call.  A la:  "system("ps -ef > uniquefile");

Please, don't use "system" but "popen" and avoid the temporary file.
(It seems to me, that "popen" is often overlooked in the manuals,
because when learning C and UNIX, programmers do not see why it's
useful, and later they do not look again).

>This file can be read and searched for the pid and the status information
>parsed.  Why this isn't simply returned to the calling process in a 
>process status structure I don't know.  

Maybe in earlier days (when the source was available) UNIX programmers
could "steel" the algorithms from the "ps" command and incorporate
then directly into their code. Of course, this would only work for
programs with access to /dev/mem (and public access to /dev/mem would
be disastreous for system security). Really, a system call seems to
be missing here ...
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

gwyn@smoke.BRL.MIL (Doug Gwyn) (02/25/90)

In article <22@ctbilbo.UUCP> ray@ctbilbo.UUCP (Ray Ward) writes:
-You have pointed out one of the areas that needs improvement for
-real time applications, among others.  How does one get complete
-status information on a process, running or not, etc., without
-awkward programming contortions?  There is no easy way.

Sure there is.  You can access that information through the /proc
filesystem.  If you don't have this, complain to your vendor.

ray@ctbilbo.UUCP (Ray Ward) (03/03/90)

In article <12229@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <22@ctbilbo.UUCP> ray@ctbilbo.UUCP (Ray Ward) writes:
>-You have pointed out one of the areas that needs improvement for
>-real time applications, among others.  How does one get complete
>-status information on a process, running or not, etc., without
>-awkward programming contortions?  There is no easy way.
>
>Sure there is.  You can access that information through the /proc
>filesystem.  If you don't have this, complain to your vendor.

You are entirely correct about the /proc filesystem.  My thought,
however, was that since the posting was to c.u.questions, and since
the poster was obviously new to the more subtle details of 
handling child processes, a more basic and more generally applicable
response was indicated.

If you have SVR4 UNIX, which was released to the general public
last fall, and have been into the /proc filesystem, and know how
to use ioctl() to extract ps-type information (undocumented in
both the SVR4 Migration Guide and my copy of the SVID89) or how
to extract the information directly from the image of the process
through the /proc filesystem, then you have a way to access
process information directly.   (Of course, with this level of
knowledge, one might be posting answers in c.u.wizards instead
of posting questions about the status of child processes in
c.u.questions...)

I have noticed from your other postings that you are very knowledgeable
as well as helpful.  Have you slipped into one of my habits --
giving a more technical response than is indicated -- or have I
misjudged, on the low side, the technical experience of the poster?
Or, indeed, of c.u.questions?

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Ray Ward                                          Email:  uunet!ctbilbo!ray  
Voice:  (214) 991-8338x226, (800) 331-7032        Fax  :  (214) 991-8968     
=-=-=-=-  There _are_ simple answers, just no _easy_ ones. -- R.R. -=-=-=-=

guy@auspex.auspex.com (Guy Harris) (03/07/90)

>If you have SVR4 UNIX, which was released to the general public
>last fall, and have been into the /proc filesystem, and know how
>to use ioctl() to extract ps-type information (undocumented in
>both the SVR4 Migration Guide and my copy of the SVID89)

Not surprising, since:

	1) I think the S5R4 Migration Guide is for the benefit of people
	   migrating code from earlier S5 releases, SunOS, BSD, Xenix,
	   etc. to S5R4, and most of those systems don't have "/proc";

	2) the SVID is for stuff AT&T decided to put into the SVID, and
	   apparently they decided not to put "/proc" into the SVID. 
	   (On the other hand, it *does* have "ptrace()", sigh....)

I suspect (and sincerely hope!) that "/proc" will be documented in the
actual S5R4 manual pages.

mb@rex.cs.tulane.edu (Mark Benard) (03/07/90)

In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes:
>...... The children are executing simultaneously, so the parent uses
>non-blocking reads, polling each pipe to see if anything has arrived.
>Unfortunately, a call to 'read' returns zero if the child hasn't
>sent any new data *OR* if the child has terminated so the parent cannot
>distinguish between EOF on a pipe and a pipe that temporarily has no
>data in it.

If you have BSD, you can use select instead of polling.  Then do a
blocking read on the selected channels, which will either have data or
an EOF condition.

-Mark
-- 
Mark Benard
Department of Computer Science     INTERNET & BITNET: mb@cs.tulane.edu
Tulane University                  USENET:   rex!mb
New Orleans, LA 70118

chris@mimsy.umd.edu (Chris Torek) (03/07/90)

>In article <5090.25e135aa@mva.cs.liv.ac.uk> adh@mva.cs.liv.ac.uk writes:
>>Unfortunately, a [nonblocking] call to 'read' returns zero if the child
>>hasn't sent any new data *OR* [for EOF]....

In article <2377@rex.cs.tulane.edu> mb@rex.cs.tulane.edu (Mark Benard) writes:
>If you have BSD, you can use select instead of polling.

If you have BSD, a non-blocking read() will return -1 with errno set to
EWOULDBLOCK; an EOF returns 0 with errno unchanged.

Anyway, the original question (about detecting exited children) has already
been answered.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.BRL.MIL (Doug Gwyn) (03/08/90)

In article <30@ctbilbo.UUCP> ray@ctbilbo.UUCP (Ray Ward) writes:
>In article <12229@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>>Sure there is.  You can access that information through the /proc
>>filesystem.  If you don't have this, complain to your vendor.
>I have noticed from your other postings that you are very knowledgeable
>as well as helpful.  Have you slipped into one of my habits --
>giving a more technical response than is indicated -- or have I
>misjudged, on the low side, the technical experience of the poster?
>Or, indeed, of c.u.questions?

Probably it just reflected my general frustration at how long it
takes to get vendors (including UCB) to pick up good ideas and
incorporate them in what they distribute.  I still recall with
amazement and some degree of anger how Larry Brown had never heard
of Dennis Ritchie's work on "stackable line disciplines" that finally
(perhaps I provided a small amount of initial impetus) got adopted by
AT&T as "STREAMS".  To their credit, AT&T (UNIX Operation, or whoever
the developers are these days) has been much, much better about
improving UNIX recently than they were just a few years ago.

Another point, however, is that detailed examination and control of
processes is simply not a novice topic.  Until we get practically
all vendors to get with the program, as opposed to going off and
building incompatible UNIX-like variants, programmers simply don't
have simple solutions at hand for hard problems.  At best, they have
to implement several solutions to cover the different flavors of UNIX
(assuming they're concerned with porting their applications, as for
the most part they should be).  At worst, they cannot reasonably
solve their technical problems on some platforms.

What I think programmers could do to help is to get their computer
acquisition staff to inform vendors that they will be specifying
conformance to XPG3, SVID3, X3.159, and 1003.1, and by golly failure
to provide ALL the specified services will result in no sale.