[net.unix-wizards] System V and SIGCLD

lindsay@cheviot.uucp (Lindsay F. Marshall) (05/07/86)

The following code goes into an infinite loop on System V :-

	trap(sig)
	int sig;
	{
		printf("trapped SIGCLD\n");
		signal(SIGCLD, trap);	/* reset handler */
	}

	main()
	{
		signal(SIGCLD, trap);
		switch ( fork() )
		{
		case 0 : /* child */
			sleep(5);
			exit(0);
		case -1 :
			printf("error\n");
			exit(1);
		default :
			pause();
		}
		exit(0);
	}

The problem is that resetting the SIGCLD trap inside the handler causes the
signal to be raised again and the handler to be re-entered...... This
is not documented in the manual page and seems to me to be a bug as if you
do not reset the handler the system seems to set it to SIG_DFL, meaning that
you will loose any SIGCLD signals between the handler's exit and your getting
a chance to call signal again. Anyone have any thoughts, information etc. on
this problem??

------------------------------------------------------------------------------
Lindsay F. Marshall, Computing Lab., U of Newcastle upon Tyne, Tyne & Wear, UK
  ARPA  : lindsay%cheviot.newcastle.ac.uk@ucl-cs.arpa
  JANET : lindsay@uk.ac.newcastle.cheviot
  UUCP  : <UK>!ukc!cheviot!lindsay
-------------------------------------------------------------------------------

nwh@hrc63.UUCP (Nigel Holder Marconi) (05/08/86)

Relay-Version: version B 2.10.2 9/18/84; site hrc63.UUCP
Posting-Version: version B 2.10.2 9/18/84; site cheviot.uucp
Path: hrc63!ukc!cheviot!lindsay
From: lindsay@cheviot.uucp (Lindsay F. Marshall)
Newsgroups: net.unix-wizards
Subject: System V and SIGCLD
Message-ID: <709@cheviot.uucp>
Date: 7 May 86 10:33:56 GMT
Date-Received: 8 May 86 06:26:52 GMT
Reply-To: lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall)
Organization: U. of Newcastle upon Tyne, U.K.
Lines: 40
Xpath: ukc eagle

The following code goes into an infinite loop on System V :-

	trap(sig)
	int sig;
	{
		printf("trapped SIGCLD\n");
		signal(SIGCLD, trap);	/* reset handler */
	}

	main()
	{
		signal(SIGCLD, trap);
		switch ( fork() )
		{
		case 0 : /* child */
			sleep(5);
			exit(0);
		case -1 :
			printf("error\n");
			exit(1);
		default :
			pause();
		}
		exit(0);
	}

The problem is that resetting the SIGCLD trap inside the handler causes the
signal to be raised again and the handler to be re-entered...... This
is not documented in the manual page and seems to me to be a bug as if you
do not reset the handler the system seems to set it to SIG_DFL, meaning that
you will loose any SIGCLD signals between the handler's exit and your getting
a chance to call signal again. Anyone have any thoughts, information etc. on
this problem??

------------------------------------------------------------------------------
Lindsay F. Marshall, Computing Lab., U of Newcastle upon Tyne, Tyne & Wear, UK
  ARPA  : lindsay%cheviot.newcastle.ac.uk@ucl-cs.arpa
  JANET : lindsay@uk.ac.newcastle.cheviot
  UUCP  : <UK>!ukc!cheviot!lindsay
-------------------------------------------------------------------------------

nwh@hrc63.UUCP (Nigel Holder Marconi) (05/09/86)

   The problem with resetng SIGCLD is that the signal is still valid since
the child process is waiting for the parent to perform a wait.  The following
implements this and of course works !



	trap(sig)

	int	sig;

	{
		int	c;

		printf("trapped SIGCLD\n");
		wait(&c);
		signal(SIGCLD, trap);	/* reset handler */
	}


   Now that brings me to wait.  4.2 at least provides two flavours of wait :
wait and wait3.  Now wait3 is new and is free to do what it wants in its own
way.   Wait however, does not requires an int pointer, it requires a pointer
to a union which happens to start with an int.  Whether this affects
programs written in Sys V flavour or not is probably well defined at the
moment, but it could change.  Just another example of a transparent
difference between flavours that is easily overlooked.

keith@enmasse.UUCP (Keith Crews) (05/09/86)

In article <709@cheviot.uucp> lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) writes:
>The following code goes into an infinite loop on System V :-
>
>
>The problem is that resetting the SIGCLD trap inside the handler causes the
>signal to be raised again and the handler to be re-entered...... This
>is not documented in the manual page and seems to me to be a bug as if you
>do not reset the handler the system seems to set it to SIG_DFL, meaning that
>you will loose any SIGCLD signals between the handler's exit and your getting
>a chance to call signal again. Anyone have any thoughts, information etc. on
>this problem??

The signal is raised again because the child still exists.  To do what you want
you have to do a wait in the signal handler before resetting the signal.
This explaination is due to a fellow employee - any errors in conveying it
are no doubt due to me.  In my system V manual there is a discussion of
what happens to SIGCLD while the signal catcher is executing, but it
does not seem to imply this behavior.

		Keith Crews

dave@inset.UUCP (Dave Lukes) (05/09/86)

In article <709@cheviot.uucp> lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) writes:
>The following code goes into an infinite loop on System V :-
>
>	trap(sig)
>	int sig;
>	{
>		printf("trapped SIGCLD\n");
>		signal(SIGCLD, trap);	/* reset handler */
>	}
>
>	...
>
>The problem is that resetting the SIGCLD trap inside the handler causes the
>signal to be raised again and the handler to be re-entered......

Yes, this is because you still have an unwait()ed for child!!
What you have to do is wait() for the child in the SIGCLD handler,
THEN reset the handler: this works fine.

>This is not documented in the manual page and seems to me to be a bug as if you
>do not reset the handler the system seems to set it to SIG_DFL, meaning that
>you will loose any SIGCLD signals between the handler's exit and your getting
>a chance to call signal again.

WRONG!!! (``It's not a bug: it's a feature'')
If you catch SIGCLD you will get sent SIGCLD whenever you have ANY
zombie children around (whether newly zombified or not):
the same thing happens when you re-catch it.
Yes, the manual is wrong (as well as totally unclear): it should say that
any pending SIGCLD signals are queued until you call signal(SIGCLD, ...) again
				      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
it should also remind you that you still MUST call wait() to dispose of the
children.

Still, in defence of SIGCLD:
it IS safe (you NEVER lose any children), AND usable (if you know how!).

Hope this helps.
-- 
		Dave Lukes. (...!inset!dave)

``Fox hunting: the unspeakable chasing the inedible'' -- Oscar Wilde

andy@altos86.UUCP (Andy Hatcher) (05/09/86)

#
# sorry can't seem to mail this
#
You will probably get lots of replys to this one.

The problem is that you have not destroyed the dead child.
You should be doing a wait system call inside your signal routine,
otherwise when you leave the child is still there and you get the
same signal again.

This is the way it is deliberately implemented, if you have more
than one child that dies at the same time then you will continue
to reenter the signal handler until they are all gone.

	Andy Hatcher
	seismo!lll-crg!lll-lcc!vecpyr!altos86!andy


P.S.  I've always been told that it is a bad idea to put printfs in
a signal handling routine.   The signal handler is called asyncronously
and if you use stdio both inside and outside the signal handler you
could make it very confused.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/10/86)

In article <709@cheviot.uucp> lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) writes:
>The following code goes into an infinite loop on System V :-
>
>	trap(sig)
>	int sig;
>	{
>		printf("trapped SIGCLD\n");
>		signal(SIGCLD, trap);	/* reset handler */
>	}
>
>	main()
>	{
>		signal(SIGCLD, trap);
>		switch ( fork() )
>		{
>		case 0 : /* child */
>			sleep(5);
>			exit(0);
>		case -1 :
>			printf("error\n");
>			exit(1);
>		default :
>			pause();
>		}
>		exit(0);
>	}
>
>The problem is that resetting the SIGCLD trap inside the handler causes the
>signal to be raised again and the handler to be re-entered...... This
>is not documented in the manual page and seems to me to be a bug as if you
>do not reset the handler the system seems to set it to SIG_DFL, meaning that
>you will loose any SIGCLD signals between the handler's exit and your getting
>a chance to call signal again. Anyone have any thoughts, information etc. on
>this problem??

The reason SIGCLD keeps recurring is that you continue to have an
unwaited-for terminated child process.  A wait() must be done to
lay the zombie to rest.

As to the window of vulnerability:  Yes, all generally-available
UNIXes except 4.2BSD have this problem.  AT&T has said that they
plan to change to Berkeley-like "reliable signals" in some future
release of UNIX System V.

chris@umcp-cs.UUCP (Chris Torek) (05/10/86)

In article <709@cheviot.uucp> lindsay@cheviot.newcastle.ac.uk
(Lindsay F. Marshall) writes:
>The following code goes into an infinite loop on System V :-
>
>	trap(sig)
>	int sig;
>	{
>		printf("trapped SIGCLD\n");
>		signal(SIGCLD, trap);	/* reset handler */
>	}
>
>	main()
>	{
>		signal(SIGCLD, trap);
	[...]

>[...] if you do not reset the handler the system seems to set it to
>SIG_DFL, meaning that you will loose [sic] any SIGCLD signals between
>the handler's exit and your getting a chance to call signal again.

Your loop behaves in accordance with my formulation of the System
V internals for SIGCLD.  I posted them some time ago, and received
no comments, which I (tenatively) take to mean that I was completely
correct.  Given that particular implementation, *any* SIGCLD trap
routine which does not do at least one `wait' system call before
doing another `signal(SIGCLD, trap)' will recurse until it runs
out of stack space.

Here is what System V really does (by my analysis):

1.  Any time a child exits, the kernel examines its parent's SIGCLD
    disposition, and takes one of the following actions:

	SIG_DFL
		The child is left as a zombie (`<exiting>').  No
		other action taken.
	SIG_IGN
		The child is silently discarded; no <exiting>
		process left behind, and the parent cannot collect
		the child's exit status.
	other
		The kernel sets the bit for SIGCLD in the parent's
		pending signals mask.  When the parent is scheduled,
		the kernel arranges to run the trap routine (and
		the kernel will then change the parent's SIGCLD
		disposition to SIG_DFL).

2.  In the kernel signal system call code, if the user is altering the
    action for SIGCLD, again the kernel examines the new disposition:

	SIG_DFL
		No action taken.
	SIG_IGN
		All currently exited children consumed.
	other
		If there are any exited children, the kernel sets
		the bit for SIGCLD in the parent's pending signals
		mask.

This does not match the manuals; but it does seem to fit the actual
behaviour, and has a clear and `efficient' (but not necessarily
`clean') implementation.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

karl@osu-eddie.UUCP (Karl Kleinpaste) (05/10/86)

lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) writes:
>The following code goes into an infinite loop on System V :-
>	trap(sig)
>	int sig;
>	{
>		printf("trapped SIGCLD\n");
>		signal(SIGCLD, trap);	/* reset handler */
>	}
>[followed by main() which forks and then pauses if it's the parent]
>
>The problem is that resetting the SIGCLD trap inside the handler causes the
>signal to be raised again and the handler to be re-entered...... This
>is not documented in the manual page and seems to me to be a bug as if you
>do not reset the handler the system seems to set it to SIG_DFL, meaning that
>you will loose any SIGCLD signals between the handler's exit and your getting
>a chance to call signal again. Anyone have any thoughts, information etc. on
>this problem??

You're almost right, but not quite.  It's not a bug.  The problem your
code demonstrates is an inappropriate way to deal with SIGCLD.  What
you need in the above trap() code is a wait(2) call before the reset
of SIGCLD in signal(2), in order to clean up the zombie child.  SIGCLD
signals queue in SysV - you have to clean up your zombie children _a_s
_t_h_e_y _o_c_c_u_r when you want to use SIGCLD on them.  Be aware that if you
get a SIGCLD for one dead child, call trap() to take care of it, and
then a second child dies while still in trap(), you will immediately
get run through trap() again when signal(2) is called.  And so on for
any <n> zombie children.  This is correctly documented in the manual
page.

I know it works, because I use it heavily in my job-control SysV csh.
-- 
Karl Kleinpaste

lindsay@cheviot.uucp (Lindsay F. Marshall) (05/12/86)

In article <344@hrc63.UUCP> nwh@hrc63.UUCP (Nigel Holder Marconi) writes:
>
>   The problem with resetng SIGCLD is that the signal is still valid since
>the child process is waiting for the parent to perform a wait.  The following
>implements this and of course works !
>......
>		wait(&c);

This is, of course, perfectly obvious, but DOESNT ANSWER MY QUESTION!!
In the application I have I MUST not do a wait inside the signal handler.
The solution of adding wait has been suggested by many people, but it
simply is no good. If you want to save status information you then have
to implement a stack wait return data, and then a new verion of wait that
looks at the stack to see if anything has terminated etc. etc. The bottom
line is that SIGCLD is very broken and ougth to be fixed!! One way round
this problem if you are only expecting SIGCLD's to come in ones is to put

	signal(SIGCLD, SIG_IGN);

before you reset the signal. This cause any outstanding SIGCLD's to be
junked (hence it only works when there is a single child) but does allow
you to reset the signal for future parent/child interactions withou causing
an infinite loop.

karl@osu-eddie.UUCP (Karl Kleinpaste) (05/12/86)

In article <283@enmasse.UUCP> keith@enmasse.UUCP (Keith Crews) writes:
>The signal is raised again because the child still exists.  To do what you
>want you have to do a wait in the signal handler before resetting the signal.
>This explaination is due to a fellow employee - any errors in conveying it
>are no doubt due to me.  In my system V manual there is a discussion of
>what happens to SIGCLD while the signal catcher is executing, but it
>does not seem to imply this behavior.

Yes, it does imply that behavior.  Not having a manual in front of me
this instant, I can't quote directly; but I distinctly recall that the
description includes the comment that the handler will be continually
re-entered until all the dead children have been cleaned up.
-- 
Karl Kleinpaste

simon@cstvax.UUCP (Simon Brown) (05/12/86)

In article <344@hrc63.UUCP> nwh@hrc63.UUCP (Nigel Holder Marconi) writes:
>   Now that brings me to wait.  4.2 at least provides two flavours of wait :
>wait and wait3.  Now wait3 is new and is free to do what it wants in its own
>way.   Wait however, does not requires an int pointer, it requires a pointer
>to a union which happens to start with an int.  Whether this affects
>programs written in Sys V flavour or not is probably well defined at the
>moment, but it could change.  Just another example of a transparent
>difference between flavours that is easily overlooked.


Actually, the union doesn't just "happen" to begin with an int - it
*has* to for compatibility with the Version-7 wait(), which was just
like:
	int procid, status;
	procid = wait(&status);

-- 

			-------------------------------------------------
			| Simon Brown,	Dept. of Computer Science,	|
			|		Edinburgh University		|
			| ...!mcvax!ukc!cstvax!simon			|
			-------------------------------------------------

nwh@hrc63.UUCP (Nigel Holder Marconi) (05/14/86)

In article 3994 (Simon Brown @ Comp. Sc., Edinburgh Univ., Scotland)	

>Actually, the union doesn't just "happen" to begin with an int - it
>*has* to for compatibility with the Version-7 wait(), which was just
>like:  ...

The problem with relying on it being compatible with version 7 is
that one day someone may just remove the int part of the union since
it has no comment to state why it is there !
Its probably the 'cat -v considered dangerous' syndrome I'm trying to
put across.

I'm sorry about naf mailshot layouts but its rather a torturous route
to get to usenet for me !

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/15/86)

In article <211@altos86.altos86.UUCP> andy@altos86.UUCP (Andy Hatcher) writes:
>P.S.  I've always been told that it is a bad idea to put printfs in
>a signal handling routine.   The signal handler is called asyncronously
>and if you use stdio both inside and outside the signal handler you
>could make it very confused.

UNIX System V Release 2 tries very hard to make stdio support
usage from inside signal catchers.  I think this was a big
mistake, as it turned Dennis's clean stdio source code into
a complicated mess.

mjs@sfsup.UUCP (M.J.Shannon) (05/19/86)

In article <709@cheviot.uucp> lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) writes:
>The following code goes into an infinite loop on System V :-
>
>	trap(sig)
>	int sig;
>	{

		/* add something like this: */
		int pid = wait(0);
		/* and you won't get the signal until the next child exits */

>		printf("trapped SIGCLD\n");
>		signal(SIGCLD, trap);	/* reset handler */
>	}
>
>Lindsay F. Marshall, Computing Lab., U of Newcastle upon Tyne, Tyne & Wear, UK
-- 
	Marty Shannon
UUCP:	ihnp4!attunix!mjs
Phone:	+1 (201) 522 6063

Disclaimer: I speak for no one.

"If I never loved, I never would have cried." -- Simon & Garfunkel

jimr@hcrvx2.UUCP (Jim Robinson) (09/16/86)

*
Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2
manual there is a warning "strongly" discouraging its use in
new programs, and there is no mention of it anywhere in the System V
Interface Definition (at least I couldn't find any).

Seems to me this is a handy signal to have as it provides a reasonably 
elegant means of cleaning up after a process. And, needless to say, 
more than a few programs will have to be changed, including 
*shell layers*, when it disappears. [Since the master layer in shell 
layers cannot remain blocked indefinitely during a 'wait' I would 
imagine that some kind of polling would be necessary. Gag.]

The only other possibility I can think of is that 5.3 has some new
and nifty feature that disallows the need for SIGCLD.

Comments?

J.B. Robinson

PS Thanks to all those who answered my query re the IEEE proposal on 
   System V compatible BSD style job control.

guy@sun.uucp (Guy Harris) (09/18/86)

> Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2
> manual there is a warning "strongly" discouraging its use in
> new programs, and there is no mention of it anywhere in the System V
> Interface Definition (at least I couldn't find any).

Geez, youngsters these days have no sense of history; they probably think
"AT&T UNIX" started with System V.  Mutter, mutter.  :-)

The System III documentation has much the same warning; it came out in 1980,
so fi they haven't dropped it by now, I suspect they're not going to
(especially since things like "init" use it as well).

A little history here.  The notion that AT&T is one big happy family when it
comes to UNIX is mistaken; there are lots of groups developing applications
to run under UNIX, and, like any other bunch of UNIX programmers, they all
have their own ideas about what they need to have UNIX do - or, like any
other bunch of UNIX programmers, they all have their own ideas about what
they *think* they need to have UNIX do.

As such, there were at one point probably more variant versions of UNIX
inside the Bell System than outside.  S5 is the product of an attempt to
merge them all into one version.  S3 was one step along this path; it picked
up a number of features from other versions of UNIX inside Bell, and SIGCLD
was probably one of them.

The people maintaining S3 may have thought that you could do something
better than SIGCLD (the notion that ignoring SIGCLD has the side effect of
discarding existing zombies and preventing the creation of new ones is
certainly a hack), and wanted to warn that it was only in there for
compatibility with other versions of UNIX.  At the time, they probably
figured there was a good chance that they would get rid of it in favor of
something better.  Either they still thought so at the time they put out the
5.2 documentation, or nobody had ever bothered to change the documentation.

> The only other possibility I can think of is that 5.3 has some new
> and nifty feature that disallows the need for SIGCLD.

Either you mean "*eliminates* the need for SIGCLD", or "disallows the *use
of* SIGCLD."  Neither, as far as I know, is true; SIGCLD is still in 5.3.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

chris@pixutl.UUCP (chris) (09/20/86)

In article <2389@hcrvx2.UUCP>, jimr@hcrvx2.UUCP (Jim Robinson) writes:
> *
> Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2
> manual there is a warning "strongly" discouraging its use in
> new programs, and there is no mention of it anywhere in the System V
> Interface Definition (at least I couldn't find any).

If SIGCLD is gone, does that mean shl is gone too? or, if not, how
do the shell layers know a job has terminated? Just wondering...

Chris
-- 

 Chris Bertin       :  (603) 881-8791 x218
 xePIX Inc.         :
 51 Lake St         :  {allegra|ihnp4|cbosgd|ima|genrad|amd|harvard}\
 Nashua, NH 03060   :     !wjh12!pixel!pixutl!chris

mjp@sfmag.UUCP (M.J.Purdome) (09/21/86)

System V Release 3 has not eliminated SIGCLD.  As a matter of fact, the
WARNING section that existed in previous versions of the documentation
has been removed from the SVR3 man page for signal.

The SVID defines 13 signals that are "standard", and it states that specific
implementations may provide implementation-dependent signals.  I suppose
this includes SIGCLD as well as SIGPOLL (used with STREAMS) and others
that are not listed in the SVID.

-- 
    Mark Purdome  --  AT&T, 190 River Road A-130, Summit, NJ 07901
                      [ihnp4 | allegra]!attunix!mjp

    disclaimer:  my opinions != AT&T opinions

jimr@hcrvx2.UUCP (Jim Robinson) (09/23/86)

In article <7396@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>> Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2
>> manual there is a warning "strongly" discouraging its use in
>> new programs, and there is no mention of it anywhere in the System V
                                                               ^^^^^^^^	
>> Interface Definition (at least I couldn't find any).
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>Geez, youngsters these days have no sense of history; they probably think
>"AT&T UNIX" started with System V.  Mutter, mutter.  :-)
>
>The System III documentation has much the same warning; it came out in 1980,
>so fi they haven't dropped it by now, I suspect they're not going to
>(especially since things like "init" use it as well).

I guess I'll rephrase the question since it hasn't generated quite the
response I had hoped for.

1) I could not find any mention of SIGCLD in the System V Interface 
   Definition. Is this because I missed it, or is it because it just
   ain't there? (It certainly is not mentioned with the other signals
   in the section dealing with the 'signal' service routine)

2) Assuming the latter, does this not mean that there is no requirement
   for a SVID adhering UNIX to include SIGCLD?

3) If so, what gives? As has been pointed out, at least a couple of
   important programs are going to break?

It would be especially pleasant if someone from AT&T could take the 
time to fire in a quick response since they are in the best position
of knowing what the story is wrt the SVID and SIGCLD.

J.B. Robinson

guy@sun.uucp (Guy Harris) (09/24/86)

> I guess I'll rephrase the question since it hasn't generated quite the
> response I had hoped for.

The response you had hoped for was an explanation of why SIGCLD disappeared.
Since it *didn't* disappear, there is no chance of getting quite that
response.

> 1) I could not find any mention of SIGCLD in the System V Interface 
>    Definition. Is this because I missed it, or is it because it just
>    ain't there? (It certainly is not mentioned with the other signals
>    in the section dealing with the 'signal' service routine)

It is not there.  It is in the S5 documentation (and, as pointed out, the
"this may disappear evenutally" note disappeared in S5R3), but it's not in
the SVID.  The SVID != the System V documentation.

> 2) Assuming the latter, does this not mean that there is no requirement
>    for a SVID adhering UNIX to include SIGCLD?

Yes.

> 3) If so, what gives? As has been pointed out, at least a couple of
>    important programs are going to break?

So?  Just don't run those programs on a SVID-compliant system unless you've
verified that that system also supports SIGCLD.  There is also no
requirement that a SVID-compliant system implement the routines in the
"-lPW" library, either, and this may break some programs.

A SVID-COMPLIANT SYSTEM IS NOT REQUIRED TO BE ABLE TO RUN EVERY PROGRAM EVER
WRITTEN FOR SYSTEM V.  It is not even required to be able to run every
program whose source is shipped with System V.  That's why it's called an
"interface definition"; a SVID-compliant system is required to be able to
run every valid program written using the SVID.  The SVID defines an
interface, and people write programs to use that interface.

Some programs that come with System V are not written strictly for that
interface.  As such, they may not run on all SVID-compatible systems.

Consider SIGCLD to be an extension to UNIX, provided by certain systems,
rather than as part of the core of UNIX.  There's nothing wrong with that
system also providing an "init", or "shl", or whatever, that uses that
extension.  If another system doesn't have that extension, it'll have to do
things differently.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

karl@cbrma.UUCP (Karl Kleinpaste) (09/24/86)

jimr@hcrvx2.UUCP (Jim Robinson) writes:
>In article <7396@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>>> Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2
>>> manual there is a warning "strongly" discouraging its use in
>>> new programs, and there is no mention of it anywhere in the System V
>>> Interface Definition (at least I couldn't find any).
>>
>>The System III documentation has much the same warning; it came out in 1980,
>>so fi they haven't dropped it by now, I suspect they're not going to
>>(especially since things like "init" use it as well).
>
>1) I could not find any mention of SIGCLD in the System V Interface 
>   Definition. Is this because I missed it, or is it because it just
>   ain't there? (It certainly is not mentioned with the other signals
>   in the section dealing with the 'signal' service routine)

It's not there.  Not in my copy, anyway, from Spring 1985.  That fact
notwithstanding, notice that neither are SIGIOT, SIGEMT, SIGBUS, or
SIGSEGV.  I have my doubts that they'll go away any time soon.  What
would application development in a UNIX environment be like without
the ever-entertaining comment, "Segmentation violation - core dumped"?

Of course "init" could be hacked so that it no longer utilized SIGCLD.
But then "init" wouldn't have had new code put into it to handle
SIGCLD if it weren't considered important, especially with that
warning present.

>2) Assuming the latter, does this not mean that there is no requirement
>   for a SVID adhering UNIX to include SIGCLD?

Um..."requirement" in a technical or political sense?  Technically,
SIGCLD could be missing from Sys5.N (N>3) just because somebody's mood
was bad the day such a decision had to be made.  Politically, there
would be hell to pay if it were taken out without a darn good
replacement strategy for asynchronous notification of child death.

>3) If so, what gives? As has been pointed out, at least a couple of
>   important programs are going to break?

Guy's right - it's going to stay.  It would break a non-trivial amount
of code (nobody really *wants* to hack up "init" again), and it's a
useful feature; I use it quite a lot, in a job control emulation.

>It would be especially pleasant if someone from AT&T could take the 
>time to fire in a quick response since they are in the best position
>of knowing what the story is wrt the SVID and SIGCLD.

Right, here's a disclaimer: I work for AT&T-BL, but I have no
work-related connections with the folks who make decisions like that.
-- 
Karl Kleinpaste

SofPasuk@imagen.UUCP (09/24/86)

> In article <7396@sun.uucp> guy@sun.uucp (Guy Harris) writes:
> >> Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2
> >> manual there is a warning "strongly" discouraging its use in
> >> new programs, and there is no mention of it anywhere in the System V
>                                                                ^^^^^^^^	
> >> Interface Definition (at least I couldn't find any).
>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> >Geez, youngsters these days have no sense of history; they probably think
> >"AT&T UNIX" started with System V.  Mutter, mutter.  :-)
> >
> >The System III documentation has much the same warning; it came out in 1980,
> >so fi they haven't dropped it by now, I suspect they're not going to
> >(especially since things like "init" use it as well).
> 
> I guess I'll rephrase the question since it hasn't generated quite the
> response I had hoped for.
> 
> 1) I could not find any mention of SIGCLD in the System V Interface 
>    Definition. Is this because I missed it, or is it because it just
>    ain't there? (It certainly is not mentioned with the other signals
>    in the section dealing with the 'signal' service routine)
> 
> 2) Assuming the latter, does this not mean that there is no requirement
>    for a SVID adhering UNIX to include SIGCLD?
> 
> 3) If so, what gives? As has been pointed out, at least a couple of
>    important programs are going to break?
> 
> It would be especially pleasant if someone from AT&T could take the 
> time to fire in a quick response since they are in the best position
> of knowing what the story is wrt the SVID and SIGCLD.

I couldn't find SIGCLD in SVID either.  The only means in SVID to detect the
completion of a child process seems to be via WAIT, i.e. a planned, synchronous
activity on the part of a program as opposed to an interrupt.

I second the request that some RESPONSIBLE party from the American Telephone &
Telegraph Corporation who is DIRECTLY INVOLVED with SVID directly respond to
this issue.  (Please no flames about whose UNIX is better or whose long distance
service is better or who makes better switchboards!)

jas@rtech.UUCP (Jim Shankland) (09/24/86)

Guy Harris writes:

    Just don't run programs [needing the SIGCLD signal] on a SVID-compliant
    system unless you've verified that that system also supports SIGCLD.

    A SVID-COMPLIANT SYSTEM IS NOT REQUIRED TO BE ABLE TO RUN EVERY PROGRAM
    EVER WRITTEN FOR SYSTEM V.  It is not even required to be able to run
    every program whose source is shipped with System V.  That's why it's
    called an "interface definition"; a SVID-compliant system is required
    to be able to run every valid program written using the SVID.  The SVID
    defines an interface, and people write programs to use that interface.

    Consider SIGCLD to be an extension to UNIX, provided by certain systems,
    rather than as part of the core of UNIX.

All true, but SIGCLD is an awfully useful piece of UNIX to be leaving out
of SVID, especially when there is no persuasive reason to leave it out
(unlike shared memory, for example, which is hard to implement on
a loosely coupled multiprocessor such as the CT Megaframe).  If the
interface definition is unnecessarily restrictive, it loses some of
its usefulness, since it is likely to be extended in non-standard ways
(Pascal comes to mind).
-- 
Jim Shankland
 ..!ihnp4!cpsc6a!\
                  rtech!jas
..!ucbvax!mtxinu!/

brett@wjvax.UUCP (Brett Galloway) (09/26/86)

In article <453@rtech.UUCP> jas@rtech.UUCP (Jim Shankland) writes:
>Guy Harris writes:
>
>    Just don't run programs [needing the SIGCLD signal] on a SVID-compliant
>    system unless you've verified that that system also supports SIGCLD.
>
>    A SVID-COMPLIANT SYSTEM IS NOT REQUIRED TO BE ABLE TO RUN EVERY PROGRAM
>    EVER WRITTEN FOR SYSTEM V.  It is not even required to be able to run
>    every program whose source is shipped with System V.  That's why it's
>    called an "interface definition"; a SVID-compliant system is required
>    to be able to run every valid program written using the SVID.  The SVID
>    defines an interface, and people write programs to use that interface.
>
>    Consider SIGCLD to be an extension to UNIX, provided by certain systems,
>    rather than as part of the core of UNIX.
>
>All true, but SIGCLD is an awfully useful piece of UNIX to be leaving out
>of SVID, especially when there is no persuasive reason to leave it out
>(unlike shared memory, for example, which is hard to implement on
>a loosely coupled multiprocessor such as the CT Megaframe).  If the
>interface definition is unnecessarily restrictive, it loses some of
>its usefulness, since it is likely to be extended in non-standard ways
>(Pascal comes to mind).

Here here!  Standards definitions can fail in one of two ways.  The first is
making the standard unnecessarily generous in features (e.g. ada).  This makes
applications difficult to port because the intended environment to port to may
not have implemented a feature needed by the application.  The second failure
is making the standard unnecessarily miserly in features (e.g. the SVID with
respect to SIGCLD).  This makes applications difficult to port because each
implementation of the standard is likely to extend it in its own way to
provide useful functionality.  To be useful and portable, the standard must
strike the golden mean.  I have not read the SVID, but the omission of
SIGCLD leads me to believe that the authors of SVID inclined to the latter
error.
-- 
-------------
Brett Galloway
{pesnta,twg,ios,qubix,turtlevax,tymix,vecpyr,certes,isi}!wjvax!brett

rml@hpfcdc.HP.COM (Bob Lenk) (09/26/86)

> All true, but SIGCLD is an awfully useful piece of UNIX to be leaving out
> of SVID, especially when there is no persuasive reason to leave it out
> (unlike shared memory, for example, which is hard to implement on
> a loosely coupled multiprocessor such as the CT Megaframe).

I would speculate that it was left out because of a desire not to
standardize some of the specific semantics SICLD has in System V
implementations.  In particular, many people are not fond of the
side-effect that setting SIGCLD to SIG_IGN has on wait(2).  Also, the
precise semantics of how SIGCLD is "queued" do not agree between System
V documentation and implementation, so there could be disagreement on
what to standardize.

>                                                              If the
> interface definition is unnecessarily restrictive, it loses some of
> its usefulness, since it is likely to be extended in non-standard ways

That's certainly a valid point which needs to be traded off against
the risk of standardizing the "wrong" feature, thus either perpetuating
that feature or reducing acceptance of the standard.  I make no
judgement as to whether AT&T made the correct tradeoff in this case.

		Bob Lenk
		{ihnp4, hplabs}!hpfcla!rml

naim@nucsrl.UUCP (Naim Abdullah) (10/02/86)

Bob Lenk writes:
>Also, the
>precise semantics of how SIGCLD is "queued" do not agree between System
>V documentation and implementation, so there could be disagreement on
>what to standardize.

Could you explain this a little bit further ? In what way, does the
implementation differ from the documentation ?

I have a stake in SIGCLD because one fairly large program that I have
written depends upon SIGCLD (although I was aware of the warning I
didn't see how I could duplicate the asynchronous notification by 
any other (reasonable) means; any ideas out there ?)

        Naim Abdullah,
	Dept. of EECS,
	Northwestern University,
	ihnp4!nucsrl!naim

rml@hpfcdc.HP.COM (Bob Lenk) (10/03/86)

> >Also, the
> >precise semantics of how SIGCLD is "queued" do not agree between System
> >V documentation and implementation, so there could be disagreement on
> >what to standardize.
> 
> Could you explain this a little bit further ? In what way, does the
> implementation differ from the documentation ?

The System V Release 2 manual says, "... while the process is executing
the signal-catching function, any received SIGCLD signals will be queued
and the signal-catching function will be continually reentered until the
queue is empty."

A more accurate description would be something like:

If _signal_ is called to catch SIGCLD in a process which currently has
terminated (zombie) children, a SIGCLD signal is delivered to the
process immediately.  Thus if the signal-catching function re-installs
itself, the apparent effect is that any SIGCLD signals received due to
the death of children while the function is executing are queued and the
signal-catching function is continually reentered until the queue is
empty.  Note that the function must re-install itself after it has
called _wait_(2).  Otherwise the presence of the child which caused the
original signal will cause another signal immediately, resulting in
infinite recursion.

> I have a stake in SIGCLD because one fairly large program that I have
> written depends upon SIGCLD

If your program works, it probably uses SIGCLD as I've described, and
the behavior agrees with the documentation.  Problems occur when programs
don't use it in this way (eg. re-install the handler before calling
wait).
			Bob Lenk
			{ihnp4, hplabs}!hpfcla!rml

chris@umcp-cs.UUCP (Chris Torek) (10/06/86)

>Bob Lenk writes:
>>Also, the precise semantics of how SIGCLD is "queued" do not agree
>>between System V documentation and implementation, so there could be
>>disagreement on what to standardize.

In article <2410003@nucsrl.UUCP> naim@nucsrl.UUCP (Naim Abdullah) writes:
>Could you explain this a little bit further?  In what way, does the
>implementation differ from the documentation?

We have been through this one before.  I have neither SysV source
nor SysV documentation, but I believe my understanding to be correct.
At any rate, no one has proved this wrong:

The SysV documentation claims that SIGCLD is queued---that is, if
two children of a given process die `simultaneously', that that
process will receive two separate SIGCLDs.  The documentation is
wrong.  The signal is not queued.  However, any program handling
SIGCLD via the `recommended method' will indeed receive two SIGCLDs.

Three details are important to understanding this.  First, whenever
a signal is delivered in a SysV kernel, the signal disposition is
changed to SIG_DFL.  (This means that holding down one's interrupt
key will, unless the machine is stupendously fast, eventually kill
a program no matter how hard it tries to avoid this.  This and
other similar arguments are what is behind the Berkeley Reliable
Signals, which are, alas, thoroughly incompatible with previous
systems.)  Second, when the SIGCLD disposition is SIG_DFL, a SysV
kernel does nothing special: an exiting child remains exiting.
Third, and this is the key, whenever the SIGCLD disposition is
altered to SIG_CATCH---that is, to a catching routine---a SysV kernel
scans for exiting chilren.  If there are any, it sends exactly one
SIGCLD signal.  This, of course, alters the disposition back to
SIG_DFL, and the loop runs until there are no more children.

That this is indeed the implementation may be demonstrated by
running a small test program:

	#include <signal.h>

	catch()
	{
		int status;

		/* if the wait() is done before the signal(), this works. */
		(void) signal(SIGCLD, catch);
		(void) wait(&status);
	}

	main()
	{

		(void) signal(SIGCLD, catch);
		if (fork())
			_exit(1);
		exit(0);
	}

This program will eventually run out of stack space.

(It is true that there are other potential SIGCLD implementations
that might show the same behaviour.  But the one I outlined above
is a trivial change to a V7 kernel, and I do not doubt that those
who wrote the code followed the path of least resistance.)

I believe that signal queueing would in fact be a better solution
than either Berkeley Reliable Signals, which model machine interrupts
rather closely, or the SysV style SIGCLD signal.  Both work for
this specific case, though Berkeley did have to add a three-argument
`wait' syscall.  Berkeley's solution is more general than SysV's,
and I think it is therefore better, but it does seem to have `kludge
for acceptable efficiency' stamped all over it.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu