lindsay@cheviot.uucp (Lindsay F. Marshall) (05/07/86)
The following code goes into an infinite loop on System V :- trap(sig) int sig; { printf("trapped SIGCLD\n"); signal(SIGCLD, trap); /* reset handler */ } main() { signal(SIGCLD, trap); switch ( fork() ) { case 0 : /* child */ sleep(5); exit(0); case -1 : printf("error\n"); exit(1); default : pause(); } exit(0); } The problem is that resetting the SIGCLD trap inside the handler causes the signal to be raised again and the handler to be re-entered...... This is not documented in the manual page and seems to me to be a bug as if you do not reset the handler the system seems to set it to SIG_DFL, meaning that you will loose any SIGCLD signals between the handler's exit and your getting a chance to call signal again. Anyone have any thoughts, information etc. on this problem?? ------------------------------------------------------------------------------ Lindsay F. Marshall, Computing Lab., U of Newcastle upon Tyne, Tyne & Wear, UK ARPA : lindsay%cheviot.newcastle.ac.uk@ucl-cs.arpa JANET : lindsay@uk.ac.newcastle.cheviot UUCP : <UK>!ukc!cheviot!lindsay -------------------------------------------------------------------------------
nwh@hrc63.UUCP (Nigel Holder Marconi) (05/08/86)
Relay-Version: version B 2.10.2 9/18/84; site hrc63.UUCP Posting-Version: version B 2.10.2 9/18/84; site cheviot.uucp Path: hrc63!ukc!cheviot!lindsay From: lindsay@cheviot.uucp (Lindsay F. Marshall) Newsgroups: net.unix-wizards Subject: System V and SIGCLD Message-ID: <709@cheviot.uucp> Date: 7 May 86 10:33:56 GMT Date-Received: 8 May 86 06:26:52 GMT Reply-To: lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) Organization: U. of Newcastle upon Tyne, U.K. Lines: 40 Xpath: ukc eagle The following code goes into an infinite loop on System V :- trap(sig) int sig; { printf("trapped SIGCLD\n"); signal(SIGCLD, trap); /* reset handler */ } main() { signal(SIGCLD, trap); switch ( fork() ) { case 0 : /* child */ sleep(5); exit(0); case -1 : printf("error\n"); exit(1); default : pause(); } exit(0); } The problem is that resetting the SIGCLD trap inside the handler causes the signal to be raised again and the handler to be re-entered...... This is not documented in the manual page and seems to me to be a bug as if you do not reset the handler the system seems to set it to SIG_DFL, meaning that you will loose any SIGCLD signals between the handler's exit and your getting a chance to call signal again. Anyone have any thoughts, information etc. on this problem?? ------------------------------------------------------------------------------ Lindsay F. Marshall, Computing Lab., U of Newcastle upon Tyne, Tyne & Wear, UK ARPA : lindsay%cheviot.newcastle.ac.uk@ucl-cs.arpa JANET : lindsay@uk.ac.newcastle.cheviot UUCP : <UK>!ukc!cheviot!lindsay -------------------------------------------------------------------------------
nwh@hrc63.UUCP (Nigel Holder Marconi) (05/09/86)
The problem with resetng SIGCLD is that the signal is still valid since
the child process is waiting for the parent to perform a wait. The following
implements this and of course works !
trap(sig)
int sig;
{
int c;
printf("trapped SIGCLD\n");
wait(&c);
signal(SIGCLD, trap); /* reset handler */
}
Now that brings me to wait. 4.2 at least provides two flavours of wait :
wait and wait3. Now wait3 is new and is free to do what it wants in its own
way. Wait however, does not requires an int pointer, it requires a pointer
to a union which happens to start with an int. Whether this affects
programs written in Sys V flavour or not is probably well defined at the
moment, but it could change. Just another example of a transparent
difference between flavours that is easily overlooked.
keith@enmasse.UUCP (Keith Crews) (05/09/86)
In article <709@cheviot.uucp> lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) writes: >The following code goes into an infinite loop on System V :- > > >The problem is that resetting the SIGCLD trap inside the handler causes the >signal to be raised again and the handler to be re-entered...... This >is not documented in the manual page and seems to me to be a bug as if you >do not reset the handler the system seems to set it to SIG_DFL, meaning that >you will loose any SIGCLD signals between the handler's exit and your getting >a chance to call signal again. Anyone have any thoughts, information etc. on >this problem?? The signal is raised again because the child still exists. To do what you want you have to do a wait in the signal handler before resetting the signal. This explaination is due to a fellow employee - any errors in conveying it are no doubt due to me. In my system V manual there is a discussion of what happens to SIGCLD while the signal catcher is executing, but it does not seem to imply this behavior. Keith Crews
dave@inset.UUCP (Dave Lukes) (05/09/86)
In article <709@cheviot.uucp> lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) writes: >The following code goes into an infinite loop on System V :- > > trap(sig) > int sig; > { > printf("trapped SIGCLD\n"); > signal(SIGCLD, trap); /* reset handler */ > } > > ... > >The problem is that resetting the SIGCLD trap inside the handler causes the >signal to be raised again and the handler to be re-entered...... Yes, this is because you still have an unwait()ed for child!! What you have to do is wait() for the child in the SIGCLD handler, THEN reset the handler: this works fine. >This is not documented in the manual page and seems to me to be a bug as if you >do not reset the handler the system seems to set it to SIG_DFL, meaning that >you will loose any SIGCLD signals between the handler's exit and your getting >a chance to call signal again. WRONG!!! (``It's not a bug: it's a feature'') If you catch SIGCLD you will get sent SIGCLD whenever you have ANY zombie children around (whether newly zombified or not): the same thing happens when you re-catch it. Yes, the manual is wrong (as well as totally unclear): it should say that any pending SIGCLD signals are queued until you call signal(SIGCLD, ...) again ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ it should also remind you that you still MUST call wait() to dispose of the children. Still, in defence of SIGCLD: it IS safe (you NEVER lose any children), AND usable (if you know how!). Hope this helps. -- Dave Lukes. (...!inset!dave) ``Fox hunting: the unspeakable chasing the inedible'' -- Oscar Wilde
andy@altos86.UUCP (Andy Hatcher) (05/09/86)
# # sorry can't seem to mail this # You will probably get lots of replys to this one. The problem is that you have not destroyed the dead child. You should be doing a wait system call inside your signal routine, otherwise when you leave the child is still there and you get the same signal again. This is the way it is deliberately implemented, if you have more than one child that dies at the same time then you will continue to reenter the signal handler until they are all gone. Andy Hatcher seismo!lll-crg!lll-lcc!vecpyr!altos86!andy P.S. I've always been told that it is a bad idea to put printfs in a signal handling routine. The signal handler is called asyncronously and if you use stdio both inside and outside the signal handler you could make it very confused.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/10/86)
In article <709@cheviot.uucp> lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) writes: >The following code goes into an infinite loop on System V :- > > trap(sig) > int sig; > { > printf("trapped SIGCLD\n"); > signal(SIGCLD, trap); /* reset handler */ > } > > main() > { > signal(SIGCLD, trap); > switch ( fork() ) > { > case 0 : /* child */ > sleep(5); > exit(0); > case -1 : > printf("error\n"); > exit(1); > default : > pause(); > } > exit(0); > } > >The problem is that resetting the SIGCLD trap inside the handler causes the >signal to be raised again and the handler to be re-entered...... This >is not documented in the manual page and seems to me to be a bug as if you >do not reset the handler the system seems to set it to SIG_DFL, meaning that >you will loose any SIGCLD signals between the handler's exit and your getting >a chance to call signal again. Anyone have any thoughts, information etc. on >this problem?? The reason SIGCLD keeps recurring is that you continue to have an unwaited-for terminated child process. A wait() must be done to lay the zombie to rest. As to the window of vulnerability: Yes, all generally-available UNIXes except 4.2BSD have this problem. AT&T has said that they plan to change to Berkeley-like "reliable signals" in some future release of UNIX System V.
chris@umcp-cs.UUCP (Chris Torek) (05/10/86)
In article <709@cheviot.uucp> lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) writes: >The following code goes into an infinite loop on System V :- > > trap(sig) > int sig; > { > printf("trapped SIGCLD\n"); > signal(SIGCLD, trap); /* reset handler */ > } > > main() > { > signal(SIGCLD, trap); [...] >[...] if you do not reset the handler the system seems to set it to >SIG_DFL, meaning that you will loose [sic] any SIGCLD signals between >the handler's exit and your getting a chance to call signal again. Your loop behaves in accordance with my formulation of the System V internals for SIGCLD. I posted them some time ago, and received no comments, which I (tenatively) take to mean that I was completely correct. Given that particular implementation, *any* SIGCLD trap routine which does not do at least one `wait' system call before doing another `signal(SIGCLD, trap)' will recurse until it runs out of stack space. Here is what System V really does (by my analysis): 1. Any time a child exits, the kernel examines its parent's SIGCLD disposition, and takes one of the following actions: SIG_DFL The child is left as a zombie (`<exiting>'). No other action taken. SIG_IGN The child is silently discarded; no <exiting> process left behind, and the parent cannot collect the child's exit status. other The kernel sets the bit for SIGCLD in the parent's pending signals mask. When the parent is scheduled, the kernel arranges to run the trap routine (and the kernel will then change the parent's SIGCLD disposition to SIG_DFL). 2. In the kernel signal system call code, if the user is altering the action for SIGCLD, again the kernel examines the new disposition: SIG_DFL No action taken. SIG_IGN All currently exited children consumed. other If there are any exited children, the kernel sets the bit for SIGCLD in the parent's pending signals mask. This does not match the manuals; but it does seem to fit the actual behaviour, and has a clear and `efficient' (but not necessarily `clean') implementation. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
karl@osu-eddie.UUCP (Karl Kleinpaste) (05/10/86)
lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) writes: >The following code goes into an infinite loop on System V :- > trap(sig) > int sig; > { > printf("trapped SIGCLD\n"); > signal(SIGCLD, trap); /* reset handler */ > } >[followed by main() which forks and then pauses if it's the parent] > >The problem is that resetting the SIGCLD trap inside the handler causes the >signal to be raised again and the handler to be re-entered...... This >is not documented in the manual page and seems to me to be a bug as if you >do not reset the handler the system seems to set it to SIG_DFL, meaning that >you will loose any SIGCLD signals between the handler's exit and your getting >a chance to call signal again. Anyone have any thoughts, information etc. on >this problem?? You're almost right, but not quite. It's not a bug. The problem your code demonstrates is an inappropriate way to deal with SIGCLD. What you need in the above trap() code is a wait(2) call before the reset of SIGCLD in signal(2), in order to clean up the zombie child. SIGCLD signals queue in SysV - you have to clean up your zombie children _a_s _t_h_e_y _o_c_c_u_r when you want to use SIGCLD on them. Be aware that if you get a SIGCLD for one dead child, call trap() to take care of it, and then a second child dies while still in trap(), you will immediately get run through trap() again when signal(2) is called. And so on for any <n> zombie children. This is correctly documented in the manual page. I know it works, because I use it heavily in my job-control SysV csh. -- Karl Kleinpaste
lindsay@cheviot.uucp (Lindsay F. Marshall) (05/12/86)
In article <344@hrc63.UUCP> nwh@hrc63.UUCP (Nigel Holder Marconi) writes: > > The problem with resetng SIGCLD is that the signal is still valid since >the child process is waiting for the parent to perform a wait. The following >implements this and of course works ! >...... > wait(&c); This is, of course, perfectly obvious, but DOESNT ANSWER MY QUESTION!! In the application I have I MUST not do a wait inside the signal handler. The solution of adding wait has been suggested by many people, but it simply is no good. If you want to save status information you then have to implement a stack wait return data, and then a new verion of wait that looks at the stack to see if anything has terminated etc. etc. The bottom line is that SIGCLD is very broken and ougth to be fixed!! One way round this problem if you are only expecting SIGCLD's to come in ones is to put signal(SIGCLD, SIG_IGN); before you reset the signal. This cause any outstanding SIGCLD's to be junked (hence it only works when there is a single child) but does allow you to reset the signal for future parent/child interactions withou causing an infinite loop.
karl@osu-eddie.UUCP (Karl Kleinpaste) (05/12/86)
In article <283@enmasse.UUCP> keith@enmasse.UUCP (Keith Crews) writes: >The signal is raised again because the child still exists. To do what you >want you have to do a wait in the signal handler before resetting the signal. >This explaination is due to a fellow employee - any errors in conveying it >are no doubt due to me. In my system V manual there is a discussion of >what happens to SIGCLD while the signal catcher is executing, but it >does not seem to imply this behavior. Yes, it does imply that behavior. Not having a manual in front of me this instant, I can't quote directly; but I distinctly recall that the description includes the comment that the handler will be continually re-entered until all the dead children have been cleaned up. -- Karl Kleinpaste
simon@cstvax.UUCP (Simon Brown) (05/12/86)
In article <344@hrc63.UUCP> nwh@hrc63.UUCP (Nigel Holder Marconi) writes: > Now that brings me to wait. 4.2 at least provides two flavours of wait : >wait and wait3. Now wait3 is new and is free to do what it wants in its own >way. Wait however, does not requires an int pointer, it requires a pointer >to a union which happens to start with an int. Whether this affects >programs written in Sys V flavour or not is probably well defined at the >moment, but it could change. Just another example of a transparent >difference between flavours that is easily overlooked. Actually, the union doesn't just "happen" to begin with an int - it *has* to for compatibility with the Version-7 wait(), which was just like: int procid, status; procid = wait(&status); -- ------------------------------------------------- | Simon Brown, Dept. of Computer Science, | | Edinburgh University | | ...!mcvax!ukc!cstvax!simon | -------------------------------------------------
nwh@hrc63.UUCP (Nigel Holder Marconi) (05/14/86)
In article 3994 (Simon Brown @ Comp. Sc., Edinburgh Univ., Scotland) >Actually, the union doesn't just "happen" to begin with an int - it >*has* to for compatibility with the Version-7 wait(), which was just >like: ... The problem with relying on it being compatible with version 7 is that one day someone may just remove the int part of the union since it has no comment to state why it is there ! Its probably the 'cat -v considered dangerous' syndrome I'm trying to put across. I'm sorry about naf mailshot layouts but its rather a torturous route to get to usenet for me !
gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/15/86)
In article <211@altos86.altos86.UUCP> andy@altos86.UUCP (Andy Hatcher) writes: >P.S. I've always been told that it is a bad idea to put printfs in >a signal handling routine. The signal handler is called asyncronously >and if you use stdio both inside and outside the signal handler you >could make it very confused. UNIX System V Release 2 tries very hard to make stdio support usage from inside signal catchers. I think this was a big mistake, as it turned Dennis's clean stdio source code into a complicated mess.
mjs@sfsup.UUCP (M.J.Shannon) (05/19/86)
In article <709@cheviot.uucp> lindsay@cheviot.newcastle.ac.uk (Lindsay F. Marshall) writes: >The following code goes into an infinite loop on System V :- > > trap(sig) > int sig; > { /* add something like this: */ int pid = wait(0); /* and you won't get the signal until the next child exits */ > printf("trapped SIGCLD\n"); > signal(SIGCLD, trap); /* reset handler */ > } > >Lindsay F. Marshall, Computing Lab., U of Newcastle upon Tyne, Tyne & Wear, UK -- Marty Shannon UUCP: ihnp4!attunix!mjs Phone: +1 (201) 522 6063 Disclaimer: I speak for no one. "If I never loved, I never would have cried." -- Simon & Garfunkel
jimr@hcrvx2.UUCP (Jim Robinson) (09/16/86)
* Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2 manual there is a warning "strongly" discouraging its use in new programs, and there is no mention of it anywhere in the System V Interface Definition (at least I couldn't find any). Seems to me this is a handy signal to have as it provides a reasonably elegant means of cleaning up after a process. And, needless to say, more than a few programs will have to be changed, including *shell layers*, when it disappears. [Since the master layer in shell layers cannot remain blocked indefinitely during a 'wait' I would imagine that some kind of polling would be necessary. Gag.] The only other possibility I can think of is that 5.3 has some new and nifty feature that disallows the need for SIGCLD. Comments? J.B. Robinson PS Thanks to all those who answered my query re the IEEE proposal on System V compatible BSD style job control.
guy@sun.uucp (Guy Harris) (09/18/86)
> Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2 > manual there is a warning "strongly" discouraging its use in > new programs, and there is no mention of it anywhere in the System V > Interface Definition (at least I couldn't find any). Geez, youngsters these days have no sense of history; they probably think "AT&T UNIX" started with System V. Mutter, mutter. :-) The System III documentation has much the same warning; it came out in 1980, so fi they haven't dropped it by now, I suspect they're not going to (especially since things like "init" use it as well). A little history here. The notion that AT&T is one big happy family when it comes to UNIX is mistaken; there are lots of groups developing applications to run under UNIX, and, like any other bunch of UNIX programmers, they all have their own ideas about what they need to have UNIX do - or, like any other bunch of UNIX programmers, they all have their own ideas about what they *think* they need to have UNIX do. As such, there were at one point probably more variant versions of UNIX inside the Bell System than outside. S5 is the product of an attempt to merge them all into one version. S3 was one step along this path; it picked up a number of features from other versions of UNIX inside Bell, and SIGCLD was probably one of them. The people maintaining S3 may have thought that you could do something better than SIGCLD (the notion that ignoring SIGCLD has the side effect of discarding existing zombies and preventing the creation of new ones is certainly a hack), and wanted to warn that it was only in there for compatibility with other versions of UNIX. At the time, they probably figured there was a good chance that they would get rid of it in favor of something better. Either they still thought so at the time they put out the 5.2 documentation, or nobody had ever bothered to change the documentation. > The only other possibility I can think of is that 5.3 has some new > and nifty feature that disallows the need for SIGCLD. Either you mean "*eliminates* the need for SIGCLD", or "disallows the *use of* SIGCLD." Neither, as far as I know, is true; SIGCLD is still in 5.3. -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
chris@pixutl.UUCP (chris) (09/20/86)
In article <2389@hcrvx2.UUCP>, jimr@hcrvx2.UUCP (Jim Robinson) writes: > * > Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2 > manual there is a warning "strongly" discouraging its use in > new programs, and there is no mention of it anywhere in the System V > Interface Definition (at least I couldn't find any). If SIGCLD is gone, does that mean shl is gone too? or, if not, how do the shell layers know a job has terminated? Just wondering... Chris -- Chris Bertin : (603) 881-8791 x218 xePIX Inc. : 51 Lake St : {allegra|ihnp4|cbosgd|ima|genrad|amd|harvard}\ Nashua, NH 03060 : !wjh12!pixel!pixutl!chris
mjp@sfmag.UUCP (M.J.Purdome) (09/21/86)
System V Release 3 has not eliminated SIGCLD. As a matter of fact, the WARNING section that existed in previous versions of the documentation has been removed from the SVR3 man page for signal. The SVID defines 13 signals that are "standard", and it states that specific implementations may provide implementation-dependent signals. I suppose this includes SIGCLD as well as SIGPOLL (used with STREAMS) and others that are not listed in the SVID. -- Mark Purdome -- AT&T, 190 River Road A-130, Summit, NJ 07901 [ihnp4 | allegra]!attunix!mjp disclaimer: my opinions != AT&T opinions
jimr@hcrvx2.UUCP (Jim Robinson) (09/23/86)
In article <7396@sun.uucp> guy@sun.uucp (Guy Harris) writes: >> Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2 >> manual there is a warning "strongly" discouraging its use in >> new programs, and there is no mention of it anywhere in the System V ^^^^^^^^ >> Interface Definition (at least I couldn't find any). ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > >Geez, youngsters these days have no sense of history; they probably think >"AT&T UNIX" started with System V. Mutter, mutter. :-) > >The System III documentation has much the same warning; it came out in 1980, >so fi they haven't dropped it by now, I suspect they're not going to >(especially since things like "init" use it as well). I guess I'll rephrase the question since it hasn't generated quite the response I had hoped for. 1) I could not find any mention of SIGCLD in the System V Interface Definition. Is this because I missed it, or is it because it just ain't there? (It certainly is not mentioned with the other signals in the section dealing with the 'signal' service routine) 2) Assuming the latter, does this not mean that there is no requirement for a SVID adhering UNIX to include SIGCLD? 3) If so, what gives? As has been pointed out, at least a couple of important programs are going to break? It would be especially pleasant if someone from AT&T could take the time to fire in a quick response since they are in the best position of knowing what the story is wrt the SVID and SIGCLD. J.B. Robinson
guy@sun.uucp (Guy Harris) (09/24/86)
> I guess I'll rephrase the question since it hasn't generated quite the > response I had hoped for. The response you had hoped for was an explanation of why SIGCLD disappeared. Since it *didn't* disappear, there is no chance of getting quite that response. > 1) I could not find any mention of SIGCLD in the System V Interface > Definition. Is this because I missed it, or is it because it just > ain't there? (It certainly is not mentioned with the other signals > in the section dealing with the 'signal' service routine) It is not there. It is in the S5 documentation (and, as pointed out, the "this may disappear evenutally" note disappeared in S5R3), but it's not in the SVID. The SVID != the System V documentation. > 2) Assuming the latter, does this not mean that there is no requirement > for a SVID adhering UNIX to include SIGCLD? Yes. > 3) If so, what gives? As has been pointed out, at least a couple of > important programs are going to break? So? Just don't run those programs on a SVID-compliant system unless you've verified that that system also supports SIGCLD. There is also no requirement that a SVID-compliant system implement the routines in the "-lPW" library, either, and this may break some programs. A SVID-COMPLIANT SYSTEM IS NOT REQUIRED TO BE ABLE TO RUN EVERY PROGRAM EVER WRITTEN FOR SYSTEM V. It is not even required to be able to run every program whose source is shipped with System V. That's why it's called an "interface definition"; a SVID-compliant system is required to be able to run every valid program written using the SVID. The SVID defines an interface, and people write programs to use that interface. Some programs that come with System V are not written strictly for that interface. As such, they may not run on all SVID-compatible systems. Consider SIGCLD to be an extension to UNIX, provided by certain systems, rather than as part of the core of UNIX. There's nothing wrong with that system also providing an "init", or "shl", or whatever, that uses that extension. If another system doesn't have that extension, it'll have to do things differently. -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
karl@cbrma.UUCP (Karl Kleinpaste) (09/24/86)
jimr@hcrvx2.UUCP (Jim Robinson) writes: >In article <7396@sun.uucp> guy@sun.uucp (Guy Harris) writes: >>> Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2 >>> manual there is a warning "strongly" discouraging its use in >>> new programs, and there is no mention of it anywhere in the System V >>> Interface Definition (at least I couldn't find any). >> >>The System III documentation has much the same warning; it came out in 1980, >>so fi they haven't dropped it by now, I suspect they're not going to >>(especially since things like "init" use it as well). > >1) I could not find any mention of SIGCLD in the System V Interface > Definition. Is this because I missed it, or is it because it just > ain't there? (It certainly is not mentioned with the other signals > in the section dealing with the 'signal' service routine) It's not there. Not in my copy, anyway, from Spring 1985. That fact notwithstanding, notice that neither are SIGIOT, SIGEMT, SIGBUS, or SIGSEGV. I have my doubts that they'll go away any time soon. What would application development in a UNIX environment be like without the ever-entertaining comment, "Segmentation violation - core dumped"? Of course "init" could be hacked so that it no longer utilized SIGCLD. But then "init" wouldn't have had new code put into it to handle SIGCLD if it weren't considered important, especially with that warning present. >2) Assuming the latter, does this not mean that there is no requirement > for a SVID adhering UNIX to include SIGCLD? Um..."requirement" in a technical or political sense? Technically, SIGCLD could be missing from Sys5.N (N>3) just because somebody's mood was bad the day such a decision had to be made. Politically, there would be hell to pay if it were taken out without a darn good replacement strategy for asynchronous notification of child death. >3) If so, what gives? As has been pointed out, at least a couple of > important programs are going to break? Guy's right - it's going to stay. It would break a non-trivial amount of code (nobody really *wants* to hack up "init" again), and it's a useful feature; I use it quite a lot, in a job control emulation. >It would be especially pleasant if someone from AT&T could take the >time to fire in a quick response since they are in the best position >of knowing what the story is wrt the SVID and SIGCLD. Right, here's a disclaimer: I work for AT&T-BL, but I have no work-related connections with the folks who make decisions like that. -- Karl Kleinpaste
SofPasuk@imagen.UUCP (09/24/86)
> In article <7396@sun.uucp> guy@sun.uucp (Guy Harris) writes: > >> Can anyone explain AT&T's rationale in dropping SIGCLD? In my 5.2 > >> manual there is a warning "strongly" discouraging its use in > >> new programs, and there is no mention of it anywhere in the System V > ^^^^^^^^ > >> Interface Definition (at least I couldn't find any). > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > >Geez, youngsters these days have no sense of history; they probably think > >"AT&T UNIX" started with System V. Mutter, mutter. :-) > > > >The System III documentation has much the same warning; it came out in 1980, > >so fi they haven't dropped it by now, I suspect they're not going to > >(especially since things like "init" use it as well). > > I guess I'll rephrase the question since it hasn't generated quite the > response I had hoped for. > > 1) I could not find any mention of SIGCLD in the System V Interface > Definition. Is this because I missed it, or is it because it just > ain't there? (It certainly is not mentioned with the other signals > in the section dealing with the 'signal' service routine) > > 2) Assuming the latter, does this not mean that there is no requirement > for a SVID adhering UNIX to include SIGCLD? > > 3) If so, what gives? As has been pointed out, at least a couple of > important programs are going to break? > > It would be especially pleasant if someone from AT&T could take the > time to fire in a quick response since they are in the best position > of knowing what the story is wrt the SVID and SIGCLD. I couldn't find SIGCLD in SVID either. The only means in SVID to detect the completion of a child process seems to be via WAIT, i.e. a planned, synchronous activity on the part of a program as opposed to an interrupt. I second the request that some RESPONSIBLE party from the American Telephone & Telegraph Corporation who is DIRECTLY INVOLVED with SVID directly respond to this issue. (Please no flames about whose UNIX is better or whose long distance service is better or who makes better switchboards!)
jas@rtech.UUCP (Jim Shankland) (09/24/86)
Guy Harris writes:
Just don't run programs [needing the SIGCLD signal] on a SVID-compliant
system unless you've verified that that system also supports SIGCLD.
A SVID-COMPLIANT SYSTEM IS NOT REQUIRED TO BE ABLE TO RUN EVERY PROGRAM
EVER WRITTEN FOR SYSTEM V. It is not even required to be able to run
every program whose source is shipped with System V. That's why it's
called an "interface definition"; a SVID-compliant system is required
to be able to run every valid program written using the SVID. The SVID
defines an interface, and people write programs to use that interface.
Consider SIGCLD to be an extension to UNIX, provided by certain systems,
rather than as part of the core of UNIX.
All true, but SIGCLD is an awfully useful piece of UNIX to be leaving out
of SVID, especially when there is no persuasive reason to leave it out
(unlike shared memory, for example, which is hard to implement on
a loosely coupled multiprocessor such as the CT Megaframe). If the
interface definition is unnecessarily restrictive, it loses some of
its usefulness, since it is likely to be extended in non-standard ways
(Pascal comes to mind).
--
Jim Shankland
..!ihnp4!cpsc6a!\
rtech!jas
..!ucbvax!mtxinu!/
brett@wjvax.UUCP (Brett Galloway) (09/26/86)
In article <453@rtech.UUCP> jas@rtech.UUCP (Jim Shankland) writes: >Guy Harris writes: > > Just don't run programs [needing the SIGCLD signal] on a SVID-compliant > system unless you've verified that that system also supports SIGCLD. > > A SVID-COMPLIANT SYSTEM IS NOT REQUIRED TO BE ABLE TO RUN EVERY PROGRAM > EVER WRITTEN FOR SYSTEM V. It is not even required to be able to run > every program whose source is shipped with System V. That's why it's > called an "interface definition"; a SVID-compliant system is required > to be able to run every valid program written using the SVID. The SVID > defines an interface, and people write programs to use that interface. > > Consider SIGCLD to be an extension to UNIX, provided by certain systems, > rather than as part of the core of UNIX. > >All true, but SIGCLD is an awfully useful piece of UNIX to be leaving out >of SVID, especially when there is no persuasive reason to leave it out >(unlike shared memory, for example, which is hard to implement on >a loosely coupled multiprocessor such as the CT Megaframe). If the >interface definition is unnecessarily restrictive, it loses some of >its usefulness, since it is likely to be extended in non-standard ways >(Pascal comes to mind). Here here! Standards definitions can fail in one of two ways. The first is making the standard unnecessarily generous in features (e.g. ada). This makes applications difficult to port because the intended environment to port to may not have implemented a feature needed by the application. The second failure is making the standard unnecessarily miserly in features (e.g. the SVID with respect to SIGCLD). This makes applications difficult to port because each implementation of the standard is likely to extend it in its own way to provide useful functionality. To be useful and portable, the standard must strike the golden mean. I have not read the SVID, but the omission of SIGCLD leads me to believe that the authors of SVID inclined to the latter error. -- ------------- Brett Galloway {pesnta,twg,ios,qubix,turtlevax,tymix,vecpyr,certes,isi}!wjvax!brett
rml@hpfcdc.HP.COM (Bob Lenk) (09/26/86)
> All true, but SIGCLD is an awfully useful piece of UNIX to be leaving out > of SVID, especially when there is no persuasive reason to leave it out > (unlike shared memory, for example, which is hard to implement on > a loosely coupled multiprocessor such as the CT Megaframe). I would speculate that it was left out because of a desire not to standardize some of the specific semantics SICLD has in System V implementations. In particular, many people are not fond of the side-effect that setting SIGCLD to SIG_IGN has on wait(2). Also, the precise semantics of how SIGCLD is "queued" do not agree between System V documentation and implementation, so there could be disagreement on what to standardize. > If the > interface definition is unnecessarily restrictive, it loses some of > its usefulness, since it is likely to be extended in non-standard ways That's certainly a valid point which needs to be traded off against the risk of standardizing the "wrong" feature, thus either perpetuating that feature or reducing acceptance of the standard. I make no judgement as to whether AT&T made the correct tradeoff in this case. Bob Lenk {ihnp4, hplabs}!hpfcla!rml
naim@nucsrl.UUCP (Naim Abdullah) (10/02/86)
Bob Lenk writes: >Also, the >precise semantics of how SIGCLD is "queued" do not agree between System >V documentation and implementation, so there could be disagreement on >what to standardize. Could you explain this a little bit further ? In what way, does the implementation differ from the documentation ? I have a stake in SIGCLD because one fairly large program that I have written depends upon SIGCLD (although I was aware of the warning I didn't see how I could duplicate the asynchronous notification by any other (reasonable) means; any ideas out there ?) Naim Abdullah, Dept. of EECS, Northwestern University, ihnp4!nucsrl!naim
rml@hpfcdc.HP.COM (Bob Lenk) (10/03/86)
> >Also, the > >precise semantics of how SIGCLD is "queued" do not agree between System > >V documentation and implementation, so there could be disagreement on > >what to standardize. > > Could you explain this a little bit further ? In what way, does the > implementation differ from the documentation ? The System V Release 2 manual says, "... while the process is executing the signal-catching function, any received SIGCLD signals will be queued and the signal-catching function will be continually reentered until the queue is empty." A more accurate description would be something like: If _signal_ is called to catch SIGCLD in a process which currently has terminated (zombie) children, a SIGCLD signal is delivered to the process immediately. Thus if the signal-catching function re-installs itself, the apparent effect is that any SIGCLD signals received due to the death of children while the function is executing are queued and the signal-catching function is continually reentered until the queue is empty. Note that the function must re-install itself after it has called _wait_(2). Otherwise the presence of the child which caused the original signal will cause another signal immediately, resulting in infinite recursion. > I have a stake in SIGCLD because one fairly large program that I have > written depends upon SIGCLD If your program works, it probably uses SIGCLD as I've described, and the behavior agrees with the documentation. Problems occur when programs don't use it in this way (eg. re-install the handler before calling wait). Bob Lenk {ihnp4, hplabs}!hpfcla!rml
chris@umcp-cs.UUCP (Chris Torek) (10/06/86)
>Bob Lenk writes: >>Also, the precise semantics of how SIGCLD is "queued" do not agree >>between System V documentation and implementation, so there could be >>disagreement on what to standardize. In article <2410003@nucsrl.UUCP> naim@nucsrl.UUCP (Naim Abdullah) writes: >Could you explain this a little bit further? In what way, does the >implementation differ from the documentation? We have been through this one before. I have neither SysV source nor SysV documentation, but I believe my understanding to be correct. At any rate, no one has proved this wrong: The SysV documentation claims that SIGCLD is queued---that is, if two children of a given process die `simultaneously', that that process will receive two separate SIGCLDs. The documentation is wrong. The signal is not queued. However, any program handling SIGCLD via the `recommended method' will indeed receive two SIGCLDs. Three details are important to understanding this. First, whenever a signal is delivered in a SysV kernel, the signal disposition is changed to SIG_DFL. (This means that holding down one's interrupt key will, unless the machine is stupendously fast, eventually kill a program no matter how hard it tries to avoid this. This and other similar arguments are what is behind the Berkeley Reliable Signals, which are, alas, thoroughly incompatible with previous systems.) Second, when the SIGCLD disposition is SIG_DFL, a SysV kernel does nothing special: an exiting child remains exiting. Third, and this is the key, whenever the SIGCLD disposition is altered to SIG_CATCH---that is, to a catching routine---a SysV kernel scans for exiting chilren. If there are any, it sends exactly one SIGCLD signal. This, of course, alters the disposition back to SIG_DFL, and the loop runs until there are no more children. That this is indeed the implementation may be demonstrated by running a small test program: #include <signal.h> catch() { int status; /* if the wait() is done before the signal(), this works. */ (void) signal(SIGCLD, catch); (void) wait(&status); } main() { (void) signal(SIGCLD, catch); if (fork()) _exit(1); exit(0); } This program will eventually run out of stack space. (It is true that there are other potential SIGCLD implementations that might show the same behaviour. But the one I outlined above is a trivial change to a V7 kernel, and I do not doubt that those who wrote the code followed the path of least resistance.) I believe that signal queueing would in fact be a better solution than either Berkeley Reliable Signals, which model machine interrupts rather closely, or the SysV style SIGCLD signal. Both work for this specific case, though Berkeley did have to add a three-argument `wait' syscall. Berkeley's solution is more general than SysV's, and I think it is therefore better, but it does seem to have `kludge for acceptable efficiency' stamped all over it. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu