[net.unix] UNIX signal question

chris@umcp-cs.UUCP (Chris Torek) (12/25/85)

In article <106@humming.UUCP> arturo@humming.UUCP (Arturo Perez) writes:

>> The signal(2) system call [when used on SIGCLD] checks to see if
>> any zombie child(ren) are present and sends the calling process
>> another SIGCLD if there are. ...  Note that the reinstallation
>> of the handler must follow the call to wait, or infinite recursion
>> results.
>>			Bob Lenk
>>			{hplabs, ihnp4}!hpfcla!rml

> This isn't correct. The problem is that the implicit 'signal(SIGCLD,
> SIG_DFL)' is done AFTER the signal trapping function returns. Thus,
> if you call signal from within the trapping function it doesn't do
> you any good. At least, this is the way it works on our SYSV/BSD
> hybrids.  

Judging from the replies I have received, I would say that Bob Lenk
is correct.  I suspect your system is following 4.2BSD signal
semantics:  What you described does not make sense, but with a little
reinterpretation it implies 4.2 style signal handling.

[I figured that as I originally raised the question, I should try to
point out the correct answers.]

			------------------

To try to give everyone a better `feel' for the AT&T code, here is
what happens.  Note that this is based on my interpretation of code
I have not seen; the implementation is obvious, so I think this
description is correct, but do not count on it.

	Assume process 1000 is parent, 1001 and 1002 are children;
	1000 has done a `signal(SIGCLD, getchild);'; and 1001 has
	just exited.  (1002 is about to exit.)  The AT&T kernel
	notes the exit of 1001, finds its parent (1000), and
	discovers that SIGCLD is being caught.  It therefore resets
	1000's SIGCLD behaviour to SIG_DFL, and sets the bit for
	SIGCLD delivery in 1000's signal delivery mask.  1001 is
	left as an ordinary zombie, awaiting a wait() by 1000.

	Now process 1002 is run; it exits.  The kernel notes this,
	finds 1000, and discovers that SIGCLD is not being caught
	(nor ignored).  1002 is therefore also left as a zombie.

	Finally 1000 is run.  SIGCLD is delivered, sending the
	program into its `getchild' routine.  This routine does
	one wait, collecting either process 1001 or 1002---just
	which is unimportant, but let us say it collects 1002.
	The routine does whatever it wants with the information
	returned by wait, then, just before returning, calls
	`signal(SIGCLD, getchild);'.

	In the signal code in the kernel, the special case code
	for SIGCLD now searches for zombies.  It finds 1001, owned
	by 1000; and this is what it was looking for:  An exited
	child owned by the calling process.  It therefore does
	*not* set SIGCLD to getchild, but rather to SIG_DFL; and
	the bit in the delivery mask is once again set.

	On return from the system call the bit is noticed and
	`getchild' is called once again.  This time it collects
	1001.  Again 1000 sets SIGCLD to getchild, but this time
	there are no exited children, so the signal is simply
	set as usual.

This probably describes the actual kernel code, with one exception:
I suspect the SysV kernel sets SIGCLD to the address of getchild
in the kernel signal function, and sets it to SIG_DFL in the same
kernel code that invokes every other user signal handler; but I
think the description is clearer without this.  (The reason this
works is that the default action of SIGCLD is indistinguishable
from that of a caught SIGCLD when SIGCLD delivery is already pending;
oddly enough, this is because signals are *not* queued, despite
the SysV manual page.)  The apparent behaviour of the kernel to
user code is the same either way, and Bob Lenk's posting is correct.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu