jmm@ecijmm.UUCP (12/20/88)
I just tripped over a bug in the pclose library routine. If you do a pclose on a file descriptor (that was opened with popen of course) in some circumstances it will hang forever. For example: pin = popen( "comm1", "r" ); pout1 = popen( "comm2", "w" ); pout2 = popen( "comm3", "w" ); while( <read pin until eof> ) <write portions to pout1 and pout2> pclose( pout1 ); pclose( pin ); /* this one hangs */ pclose( pout2 ); If the pclose( pin ) is switched with either of the other two pcloses, then there is no hang - if it is first, then all works perfectly, if it is last, then it fails with the error "no child". It is clear what is happening - the pclose( pout1 ) closes its file descriptor and then does a wait to get the status of the pipe child; but it first gets (and discards) the status of the pin pipe child. When the pclose( pin ) is done, it closes the file descriptor, and then it tries to wait for the child. But the child's status has already been discarded, and a different child still exists, so the system can't return an error, so the wait just hangs - since the pout2 child won't terminate until it get eof, and that won't happen until this process closes the file descriptor. There are many other variations possible. (I first encountered this in a perl program with only two pipes, but there were system(3) calls done in the body of the loop, which presumably were doing their own wait.) I ended up with the above version when I tried to reduce the problem to determine whether it was Perl's fault. I have isolated this behaviour on two different systems - Interactive Systems System 5/386 (I should know the release number, ask me tomorrow) and Xenix System III for 68000. (Finally the question...) Is this behaviour common to all System 5 variations? To BSD derivatives? SunOS? AIX? Your favourite here? Is there even a good general solution? I can see only one good way to handle all of the variations of some routine wanting to wait for a specific child and getting the termination info for a different child instead (which will eventually be waited for - perhaps by a totally different routine). That would be to provide some new library routines: waitfor( child, &status ) and postwait( child, status ). Waitfor would wait for a specified child (and save information internally on any other children that terminate in the meantime). Postwait would allow a routine that had done a wait call and gotten the termination for a child that it didn't know to pass that info into the mechanism for saving used by waitfor. These routines could be used internally by pclose, system, and any other library routines that have waiting for a specific child as a part of their semantics, as well as being provided to the user as a new pair of library routines for building additional capabilities that include forked children as a part of their implementation. -- -- John Macdonald
mlandau@bbn.com (Matt Landau) (12/21/88)
[Earlier article describes some race conditions that can lead to program hanging if they popen() multiple things and then close them in a certain order. . .] I've found the same sort of behavior in BSD-derived systems, with both popen and system. The basic problem is that in both cases, the library routines will wait for the process they're interested in to terminate, but will catch and discard termination information for other processes silently. The only completely reliable solution for our application turned out to be the rule "don't use popen/pclose or system for general process control functions." Instead, we wrote a set of library routines to manage multiple child processes cleanly - they know how to start, send input to, get input from, and kill any of the set of running children without affecting the others.
chris@mimsy.UUCP (Chris Torek) (12/21/88)
In article <261@ecijmm.UUCP> jmm@ecijmm.UUCP writes: >in some circumstances [pclose] will hang forever. ... >Is this behaviour common to all System 5 variations? To BSD >derivatives? SunOS? AIX? Your favourite here? It is partially fixed in 4.3BSD (and, one hopes, derivatives), where pclose() will return -1 if the process has already been reaped. >Is there even a good general solution? 4.4BSD will have a `wait4' or `waitfor' syscall, to wait for a particular process (or for any process, if the given pid is 0). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
malcolm@otc.oz (Malcolm Purvis ) (12/22/88)
In article <261@ecijmm.UUCP> jmm@ecijmm.UUCP (John Macdonald) writes: [Stuff about pclose(3) hanging when there is more than one child...] >Is this behaviour common to all System 5 variations? To BSD >derivatives? SunOS? AIX? Your favourite here? It's all of them as for as I know. See below. > >Is there even a good general solution? I can see only one good way >to handle all of the variations of some routine wanting to wait for >a specific child and getting the termination info for a different child >instead (which will eventually be waited for - perhaps by a totally >different routine). That would be to provide some new library routines: >waitfor( child, &status ) and postwait( child, status ). Waitfor would >wait for a specified child (and save information internally on any other >children that terminate in the meantime). Postwait would allow a routine >that had done a wait call and gotten the termination for a child that >it didn't know to pass that info into the mechanism for saving used by >waitfor. These routines could be used internally by pclose, system, and >any other library routines that have waiting for a specific child as a >part of their semantics, as well as being provided to the user as a new >pair of library routines for building additional capabilities that include >forked children as a part of their implementation. >-- >-- >John Macdonald I had to solve this problem recently for a C++ subprocess class and as John said it stems from the inability to wait for a specific child to die; you can only wait for the next one. The solution I came up with under BSD4.3 was to put a structure associated with each child in a linked list and use a SIGCHLD handler to do the actual waiting. When the signal arrives, the wait(2) is done in the handler and the list is searched for the pid of the dead child and, when it is found, it is marked as dead and taken off the list. The inside of the waitfor() call would then look something like: while (!child.dead) sigpause(0); /* wait for this process to die. */ *status = child.return_value; This works wonderfully in C++ because as the child structure goes out of scope the child process is automatically waited for, so stopping children not having a data structure associated with it, and also each caller gets the right return value. I don't know, however, how you could do this in C. The only problem with all this is that is falls over if the child dies before it gets inserted into the list (eg: The exec() fails because the program isn't there) so care must be taken over the order of list insertion and forking. I hope this helps. Malcolm Purvis (vacation student) |||| OTC || ACSnet: malcolm@otc.oz UUCP: {uunet,mcvax}!otc.oz!malcolm Snail: GPO Box 7000, Sydney 2001, Australia
allbery@ncoast.UUCP (Brandon S. Allbery) (12/27/88)
As quoted from <261@ecijmm.UUCP> by jmm@ecijmm.UUCP: +--------------- | I just tripped over a bug in the pclose library routine. If you do a | pclose on a file descriptor (that was opened with popen of course) in | some circumstances it will hang forever. For example: +--------------- From the manual page for popen(3) under System III: BUGS Only one stream opened by popen can be in use at once. This is less a bug than a misfeature; popen() is intended for casual use, not for complex tricks. If BSD Unix has a way to wait for a specific process ID, that could be used by pclose() (but I don't remember wait3() supporting that); the alternative is to have popen()/pclose() store the PID in a linked list, but then system() must also use that linked list -- and either new wrappers for exec?() must be made available or the use of system()/popen() and exec?() in the same program must be firmly discouraged. (And then someone will do it without reading the manual and complain about the "bug".) Moral: if you're going to play with multiple concurrent -- or even potentially concurrent -- subprocesses, do all the work manually. It takes more work to write the program, but the result is far more robust in a multi- process situation. ++Brandon -- Brandon S. Allbery, comp.sources.misc moderator and one admin of ncoast PA UN*X uunet!hal.cwru.edu!ncoast!allbery ncoast!allbery@hal.cwru.edu comp.sources.misc is moving off ncoast -- please do NOT send submissions direct Send comp.sources.misc submissions to comp-sources-misc@<backbone>.
decot@hpisod2.HP.COM (Dave Decot) (01/04/89)
The IEEE Standard 1003.1 (POSIX) requires conforming implementations to have a new function called waitpid() that does effectively what the previous respondant described as 4.4BSD's waitfor() or wait4() function. I hope BSD changes the name before it's too late. Dave
rml@hpfcdc.HP.COM (Bob Lenk) (01/05/89)
> The IEEE Standard 1003.1 (POSIX) requires conforming implementations to have > a new function called waitpid() that does effectively what the previous > respondant described as 4.4BSD's waitfor() or wait4() function. I hope BSD > changes the name before it's too late. Functionally (not necessarily syntactically): wait4 == the union of wait3 and waitpid == wait3 + pid == waitpid + rusage I'm confidant that waitpid() will be a library using wait4 in 4.4BSD. Bob Lenk hplabs!hpfcla!rml rml%hpfcla@hplabs.hp.com
guy@auspex.UUCP (Guy Harris) (01/05/89)
>The IEEE Standard 1003.1 (POSIX) requires conforming implementations to have >a new function called waitpid() that does effectively what the previous >respondant described as 4.4BSD's waitfor() or wait4() function. I hope BSD >changes the name before it's too late. Since "waitfor()"/"wait4()" does things that "waitpid()" doesn't (namely, optionally fill in a "struct rusage"), I see no reason why the people at Berkeley would want to rename "wait{4|for}" to "waitpid". They may want to provide "waitpid" as a procedure that calls "wait{4|for}", but that's a different matter.
egisin@mks.UUCP (Eric Gisin) (01/07/89)
In article <803@auspex.UUCP>, guy@auspex.UUCP (Guy Harris) writes: > Since "waitfor()"/"wait4()" does things that "waitpid()" doesn't > (namely, optionally fill in a "struct rusage"), I see no reason why the > people at Berkeley would want to rename "wait{4|for}" to "waitpid". > They may want to provide "waitpid" as a procedure that calls > "wait{4|for}", but that's a different matter. Declare waitpid as pid_t waitpid (pid_t pid, int *statp, int options, ...); #define WRUSAGE 0x10 The option WRUSAGE can indicate that a fourth parameter pointing to a struct rusage exists. There's no need for half-a-dozen wait-like functions in 4.4 BSD.
gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/08/89)
In article <633@mks.UUCP> egisin@mks.UUCP (Eric Gisin) writes: >Declare waitpid as > pid_t waitpid (pid_t pid, int *statp, int options, ...); I don't have IEEE Std 1003.1 at hand at the moment, but I'm fairly sure this declaration is incompatible with the one in the Standard. If you want a different interface, use a new name. Don't people EVER learn from experience?
guy@auspex.UUCP (Guy Harris) (01/10/89)
>Declare waitpid as > pid_t waitpid (pid_t pid, int *statp, int options, ...); It's already been defined (by POSIX), and probably not as a varargs function. Yeah, maybe it'll work on most of the machines you and I can think of (just as calling "open" with 2 arguments will), but why take chances?
ado@elsie.UUCP (Arthur David Olson) (01/17/89)
> I don't have IEEE Std 1003.1 at hand at the moment, > but I'm fairly sure this declaration is incompatible with > the one in the Standard. If you want a different interface, > use a new name. Don't people EVER learn from experience? No they don't, as a look at the changes made to function library interfaces by the dpANS shows. -- Arthur David Olson ado@ncifcrf.gov ADO is a trademark of Ampex.
gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/18/89)
In article <8608@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes: >>Don't people EVER learn from experience? >No they don't, as a look at the changes made to function library interfaces >by the dpANS shows. What changes? Having just checked over my System V "lint" library for pANS conformance, I didn't notice any actual changes.
ado@elsie.UUCP (Arthur David Olson) (01/19/89)
> > >Don't people EVER learn from experience? > >No they don't, as a look at the changes made to function library interfaces > >by the dpANS shows. > > What changes? Having just checked over my System V "lint" library for > pANS conformance, I didn't notice any actual changes. Check out the memory allocation functions (with their changed treatment of NULL arguments) and fopen (where "rb+" is given new meaning) for starters--the ground is covered in X3J11 correspondence that should be part of its records. -- Arthur David Olson ado@ncifcrf.gov ADO is a trademark of Ampex.
gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/19/89)
In article <9003@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes: >> > >Don't people EVER learn from experience? >> >No they don't, as a look at the changes made to function library interfaces >> >by the dpANS shows. >> What changes? Having just checked over my System V "lint" library for >> pANS conformance, I didn't notice any actual changes. >Check out the memory allocation functions (with their changed treatment of >NULL arguments) and fopen (where "rb+" is given new meaning) for starters The only changes regarding memory allocation and NULL arguments are that in an ANSI C Standard conforming hosted implementation, realloc() acts like malloc() when fed a null pointer, and free() of a null pointer is benign. Both those are upward compatible with previously existing practice; correct code using these functions will not be broken by the extensions. Similarly for the fopen() example. "rb+" simply HAD no correct meaning in old UNIX implementations; using it there would have been a (possibly undetected) bug. Again this is an extension. UNIX (or POSIX) implementations need not use "rb+" at all; "r+" is equivalent in those environments; thus no existing correct code has been invalidated by this (necessary!) extension. The suggestion that prompted this discussion was INCOMPATIBLE with the already-established standard usage for the function. Therefore a new name should have been used, not the standardized name.
ado@elsie.UUCP (Arthur David Olson) (01/22/89)
Thanks to Doug Gwyn for confirming that there are changes to the C language function interface in X3J11's work, and for clarifying that it it's only non-upwardly-compatible (rather than incompatible) changes that are seen as a problem. -- Arthur David Olson ado@ncifcrf.gov ADO is a trademark of Ampex.