chris@umcp-cs.UUCP (Chris Torek) (12/25/85)
In article <106@humming.UUCP> arturo@humming.UUCP (Arturo Perez) writes: >> The signal(2) system call [when used on SIGCLD] checks to see if >> any zombie child(ren) are present and sends the calling process >> another SIGCLD if there are. ... Note that the reinstallation >> of the handler must follow the call to wait, or infinite recursion >> results. >> Bob Lenk >> {hplabs, ihnp4}!hpfcla!rml > This isn't correct. The problem is that the implicit 'signal(SIGCLD, > SIG_DFL)' is done AFTER the signal trapping function returns. Thus, > if you call signal from within the trapping function it doesn't do > you any good. At least, this is the way it works on our SYSV/BSD > hybrids. Judging from the replies I have received, I would say that Bob Lenk is correct. I suspect your system is following 4.2BSD signal semantics: What you described does not make sense, but with a little reinterpretation it implies 4.2 style signal handling. [I figured that as I originally raised the question, I should try to point out the correct answers.] ------------------ To try to give everyone a better `feel' for the AT&T code, here is what happens. Note that this is based on my interpretation of code I have not seen; the implementation is obvious, so I think this description is correct, but do not count on it. Assume process 1000 is parent, 1001 and 1002 are children; 1000 has done a `signal(SIGCLD, getchild);'; and 1001 has just exited. (1002 is about to exit.) The AT&T kernel notes the exit of 1001, finds its parent (1000), and discovers that SIGCLD is being caught. It therefore resets 1000's SIGCLD behaviour to SIG_DFL, and sets the bit for SIGCLD delivery in 1000's signal delivery mask. 1001 is left as an ordinary zombie, awaiting a wait() by 1000. Now process 1002 is run; it exits. The kernel notes this, finds 1000, and discovers that SIGCLD is not being caught (nor ignored). 1002 is therefore also left as a zombie. Finally 1000 is run. SIGCLD is delivered, sending the program into its `getchild' routine. This routine does one wait, collecting either process 1001 or 1002---just which is unimportant, but let us say it collects 1002. The routine does whatever it wants with the information returned by wait, then, just before returning, calls `signal(SIGCLD, getchild);'. In the signal code in the kernel, the special case code for SIGCLD now searches for zombies. It finds 1001, owned by 1000; and this is what it was looking for: An exited child owned by the calling process. It therefore does *not* set SIGCLD to getchild, but rather to SIG_DFL; and the bit in the delivery mask is once again set. On return from the system call the bit is noticed and `getchild' is called once again. This time it collects 1001. Again 1000 sets SIGCLD to getchild, but this time there are no exited children, so the signal is simply set as usual. This probably describes the actual kernel code, with one exception: I suspect the SysV kernel sets SIGCLD to the address of getchild in the kernel signal function, and sets it to SIG_DFL in the same kernel code that invokes every other user signal handler; but I think the description is clearer without this. (The reason this works is that the default action of SIGCLD is indistinguishable from that of a caught SIGCLD when SIGCLD delivery is already pending; oddly enough, this is because signals are *not* queued, despite the SysV manual page.) The apparent behaviour of the kernel to user code is the same either way, and Bob Lenk's posting is correct. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu