[comp.unix.wizards] Sys V fork IS broken!

tbray@watsol.waterloo.edu (Tim Bray) (07/29/90)

gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
 jr@oglvee.UUCP (Jim Rosenberg) writes:
 -But if system calls fail simply because of a very temporary bout of activity,
 -that is *not my problem*!  It's the kernel's problem...
 Oh, good grief.  It is SILLY to say that the kernel should be redesigned
 to compensate for bugs in application programs.

I've been earning my living writing application programs on Unix for some
years.  Sometimes application programs need to fork().  (in fact, an informal
scan of my memory fails to reveal an important non-trivial application that
never does a fork() (and the semantics of fork() are just right and one of the
best things about unix (and those who talk about the need for a spawn() or a
run() call should spend a few years in the SYS$_CREPRC mines (sorry for the
digression))).

Every application I've written, and every other one I've seen (aside from
amateurish toys that don't check return codes) forks about like this:

  if ((child = fork()) == -1)
    FatalSystemError("Serious system trouble! Can't create process!");
  else if (child == 0)
  { /* child */ }
  else
  { /* parent */ }

I think this is right and Doug Gwyn's comment is (unusually for him) wrong.  

Having write(2) fail because a disk is full is OK - there are several
strategies which a program might reasonably adopt to handle this.  But having
fork() fail because of a likely-transient OS state is a stinking crock.  If
there is a good chance that the kernel can fix this up without a gratuitous
time delay, it should do so.  If not (i.e.  process creation has become
impossible) the whole system is seriously sick and all the applications should
ideally hear about this PDQ so they can start taking disaster relief
measures.  I don't really think there's a middle ground here.  And speaking
from my experience in the application community, I think describing absence of
special-purpose backoff & retry code for handling process creation failure by
the OS as "bugs in application programs" is pretty arrogant and unrealistic.

Cheers, Tim Bray, Open Text Systems, Waterloo, Ont.

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/29/90)

In article <1990Jul28.195032.18746@watdragon.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes:
>  if ((child = fork()) == -1)
>    FatalSystemError("Serious system trouble! Can't create process!");
>... I think describing absence of special-purpose backoff & retry code for
>handling process creation failure by the OS as "bugs in application programs"
>is pretty arrogant and unrealistic.

The bug is that your application makes no attempt to recover from a known
class of error, EAGAIN in this case.  I would say the same thing about a
wait() loop that did not properly handle EINTR error returns.  (I bet you
have those, too.)  And of course many applications that directly use the
write() system call do not properly handle short write counts.  None of
these should be considered failings of the kernel; as Terry pointed out
in another recent posting, it is the responsibility of the application to
determine the policy to be used to deal with such situations, and the
best choice for the policy depends very much upon the application.  If
the programmer has not thought about these matters, then he has done an
imperfect job of program design.  Producing high-quality applications is
hard enough as it is; if the kernel were to impose somebody's arbitrary
notion of policy in such cases it would be impossible.

rice@dg-rtp.dg.com (Brian Rice) (07/30/90)

In article <1990Jul28.195032.18746@watdragon.waterloo.edu>,
tbray@watsol.waterloo.edu (Tim Bray) writes:
|> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
|> > jr@oglvee.UUCP (Jim Rosenberg) writes:
|> > -But if system calls fail simply because of a very temporary bout
of activity,
|> > -that is *not my problem*!  It's the kernel's problem...
|> > Oh, good grief.  It is SILLY to say that the kernel should be redesigned
|> > to compensate for bugs in application programs.
|> 
|> I think [...] Doug Gwyn's comment is (unusually for him) wrong.
|>
|> Having write(2) fail because a disk is full is OK - there are several
|> strategies which a program might reasonably adopt to handle this. 
But having
|> fork() fail because of a likely-transient OS state is a stinking crock.  

My fingers almost made me redirect followups to this post to
alt.religion.computers,
because we are surely veering close to matters of faith.  But I do think
there's
something to be said in defense of traditional fork.

|> If there is a good chance that the kernel can fix this up without a
gratuitous
|> time delay, it should do so.  If not (i.e.  process creation has become
|> impossible) the whole system is seriously sick and all the
applications should
|> ideally hear about this PDQ so they can start taking disaster relief
|> measures.  

If the kernel has to make a call to fork fail, it does so for one of
exactly two
reasons: some system-imposed limit would be exceeded, or insufficient memory is
available.  That's all.  Neither of these conditions means that the system is
"seriously sick"; any process which isn't going to fork again and (in
the second
case) isn't going to do anything malloc'y need never even hear of the
situation.

If the system really is "sick"--i.e., some internal data structure is
corrupted--
then the system is going to panic, *now*, and rightly so.  (If the
kernel can't believe
its own internal data, how can it credibly notify processes to begin "disaster
relief"?  Admittedly, there's a bit of computer religion here: that programs
should fail before they lie.  But I think that sect has a great many
adherents.)
Conversely, a system isn't sick just because resources are under heavy
contention.

And, of course, the kernel tells you why your fork failed: you get EAGAIN or 
ENOMEM in errno.  All told, this means that you, the application programmer, 
gets to choose what happens in the event of a fork failure, and you even get 
some information to help your application make the choice.  That "Put the 
programmer in the driver's seat" orientation really is what UNIX means to me.

|> And speaking
|> from my experience in the application community, I think describing
absence of
|> special-purpose backoff & retry code for handling process creation
failure by
|> the OS as "bugs in application programs" is pretty arrogant and unrealistic.

"Special-purpose backoff and retry code"?  Can the kernel really do
better than this?

   while ((child = fork()) == -1 && ++error_count < MAX_FORK_FAILURES) {
       switch (errno) {
          case ENOMEM:
             if (theres_some_junk_I_can_free()) {
                 free(junk);
                 break;
             }
             /* fall through */
          case EAGAIN:
             sleep(MAYBE_LIFE_WILL_BE_NICER_IN_THIS_MANY_SECONDS);
             break;
          default:
             FatalError("Argh!  The man page lied!  #@!$& phone company OS!");
             exit(1);
       }
   }
   if (child == -1) {
       FatalError("Waaah!  The kernel won't let me fork!");
       exit(1);
   }

Well, maybe the kernel could queue each fork request that it was unable
to complete
and then satisfy each request in order...or maybe it could satisfy the smallest
request first, with some kind of aging mechanism to keep from starving forks
of big processes, etc., etc....this would get complicated, clearly, and might
even require so much overhead as to provoke thrashing.  But maybe you could do
it.  If you could, then how would you deal with the person who said, "Wait--if
the system is low on memory, I don't want my fork retried; I want to hear
about it so I can go off and do something else (maybe just sleep), then
retry"?  
This is the person who liked the old fork, and there are lots of such folk.
Looks like you'll have to add an old-fork-behavior flag, and then you'll
have two kinds of forks, some on a queue and some not, and all wanting
resources...

Clearly, this way lies VMS$MADNESS.  Let's hear it for minimal function calls
with clean interfaces, even if they necessitate a few more lines of
application code.
After all, *you* get to write that code, and you can package it up into a 
library function if you don't want to type it more than once.

Brian Rice   rice@dg-rtp.dg.com   +1 919 248-6328
DG/UX Product Assurance Engineering
Data General Corp., Research Triangle Park, N.C.

mills@ccu.umanitoba.ca (Gary Mills) (07/31/90)

gwyn@smoke.BRL.MIL (Doug Gwyn) writes:

>In article <1990Jul28.195032.18746@watdragon.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes:
>>  if ((child = fork()) == -1)
>>    FatalSystemError("Serious system trouble! Can't create process!");

>The bug is that your application makes no attempt to recover from a known
>class of error, EAGAIN in this case.  I would say the same thing about a

My old Sys V manual says EAGAIN is also set by fork when the limits on total
number of processes would be exceeded.  I don't think you would normally
want to retry in this case.  Issuing a message and exiting would be better.
-- 
-Gary Mills-             -University of Manitoba-             -Winnipeg-

andrew@alice.UUCP (Andrew Hume) (07/31/90)

	when talking about (hardware) network interfaces, dave presotto
points out that what you raelly want is one of 3 answers:
	yes it worked.
	no, it didn't but if would be worthwhile trying again.
	no, it didn't.
(an example of the middle category might be a packet collision on an ethernet.)

	tim bray seems to be of the school that wants one of two answers:
yes, it worked, or no, it didn't (and its serious!).

	as with so many of these questions, if you don't like the interface
that comes with the system call, put a wrapper around it. i guess i am agreeing
with those who have suggested this be done in user space; i don't
see how it can be done in kernel space without denying the facility for user
programs to react to EAGAIN in their own way (perhaps, asking for confirmation before sleeping).

jmm@eci386.uucp (John Macdonald) (07/31/90)

In article <13447@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
|In article <1990Jul28.195032.18746@watdragon.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes:
|>  if ((child = fork()) == -1)
|>    FatalSystemError("Serious system trouble! Can't create process!");
|>... I think describing absence of special-purpose backoff & retry code for
|>handling process creation failure by the OS as "bugs in application programs"
|>is pretty arrogant and unrealistic.
|
|The bug is that your application makes no attempt to recover from a known
|class of error, EAGAIN in this case. [...]

Well, yes, but ...

This is a "known class of error" that has been added to the meaning
of fork over the years.  (I don't know when in the various branches
of the family tree, but probably usually around the same time that
support for memory paging was being added.)  I tend to concur with
an earlier poster who suggested that returning EAGAIN even when it
is only a temporary lack of resources that is a problem would be
analogous to returning EAGAIN just because the disk buffer cache is
temporarily full instead of just putting the process to sleep while a
buffer cache entry is emptied on its behalf.

Obviously, from the intensity that is being used to suggest that this
really is an error, the situation is not that simple.  Could someone
please explain why.  Is it too difficult (or impossible) to distinguish
between a transient and a deadlocked the lack of resources?  Or would
the people claiming that "this is a policy decision that should not
be in the kernel" also claim that having the kernel automatically
wait for a buffer cache is a mistake according to the same design
philosophy?  (If not and if it is not just because of practicle
considerations like detectability and certainty of success, what is
the difference?)

While Doug claims that Tim's code above ignores a known class of
error, this was not always a known class of error - in earlier
versions of Unix it was not a class of error at all.

Certainly, from the perspective of someone who has been writing code
using fork since version 5 days, I can admit that I have never before
noticed the change from "error from fork is usually not recoverable"
to "error from fork is possibly recoverable if you try again in a
while" between S3 and S5.

Perhaps a document should be written for new system releases giving
changes to programming practice that should be used - it could contain
any change that has required a significant proportion of the standard
program set to be examined and fixed for the new (or newly noticed)
desired programming method.

Requiring programmers to change their normal programming practices
should not be done without justification (which I think can be provided
in this case), and without clear explanation (which is often lacking).
-- 
Algol 60 was an improvment on most           | John Macdonald
of its successors - C.A.R. Hoare             |   jmm@eci386

jr@oglvee.UUCP (Jim Rosenberg) (08/02/90)

In <1990Jul30.002642.18244@dg-rtp.dg.com> rice@dg-rtp.dg.com (Brian Rice) writes:
>And, of course, the kernel tells you why your fork failed: you get EAGAIN or 
>ENOMEM in errno.

Not clear!  In the V.3.2 man page for fork(2) it sez:

EAGAIN		Total amount of system memory available when reading via
		raw IO is temporarily insufficient.

ENOMEM		The process requires more space than the system is able to
		supply.

The scenario causing all the ruckus is memory temporarily unavailable while
doing a fork.  Apparently ENOMEM is what you get when you run out of swap
space.  (The BSD man page mentions swap space explicitly for ENOMEM.)

I agree with those posters who argue that out of swap space is a pretty grave
condition; I would want my applications to shut down as gracefully as possible
and bail the hell out.  If we're out of swap space, it's just as likely to get
worse as to get better.  I would certainly *not* want to retry in this
circumstance.  But I think retrying when you're out of process table slots is
pretty dubious too.  That and the no-kernel-memory problem seem *both* to be
covered by the same errno.  I would say this is a flat out and out defect in
which values of errno are returned by which circumstance.  And the man page
should say something more clear than that weird "raw IO" language.  Raw IO???

If the system call doesn't distinguish between problems that *are likely* go
away in a few seconds versus those that are of unpredictable duration, then
yessiree folks, the kernel *IS DICTATING POLICY* -- and a most unfortunate
one.  If it won't let my application judge the severity of the error then the
kernel is dictating the policy of making the application plan for the worst
case -- when it may not have to.
-- 
Jim Rosenberg             #include <disclaimer.h>      --cgh!amanue!oglvee!jr
Oglevee Computer Systems                                        /      /
151 Oglevee Lane, Connellsville, PA 15425                    pitt!  ditka!
INTERNET:  cgh!amanue!oglvee!jr@dsi.com                      /      /

sar0@cbnewsl.att.com (stephen.a.rago) (08/02/90)

In article <1990Jul28.195032.18746@watdragon.waterloo.edu>, tbray@watsol.waterloo.edu (Tim Bray) writes:
> 
> Every application I've written, and every other one I've seen (aside from
> amateurish toys that don't check return codes) forks about like this:
> 
>   if ((child = fork()) == -1)
>     FatalSystemError("Serious system trouble! Can't create process!");
>   else if (child == 0)
>   { /* child */ }
>   else
>   { /* parent */ }
> 
> I think this is right and Doug Gwyn's comment is (unusually for him) wrong.  

Sorry, but your code is wrong if you want it to be robust.  Otherwise,
for the typical "I-don't-care" application, it's fine.
For example, the shell will try an exponential backoff if fork fails.

> Having write(2) fail because a disk is full is OK - there are several
> strategies which a program might reasonably adopt to handle this.  But having
> fork() fail because of a likely-transient OS state is a stinking crock.  If
> there is a good chance that the kernel can fix this up without a gratuitous
> time delay, it should do so.  If not (i.e.  process creation has become
> impossible) the whole system is seriously sick and all the applications should
> ideally hear about this PDQ so they can start taking disaster relief
> measures.

If the kernel can't allocate enough memory for a process to fork, then
your system will be in worse trouble than you think.  Especially in SVR4,
where most data structures are dynamically allocated, this is a situation
that requires immediate attention.  It depicts either a memory leak or a
workload that requires more physical memory than available.  Imagine not
being able to log in as root on the console because the system can't
allocate a streams message.

> I don't really think there's a middle ground here.  And speaking
> from my experience in the application community, I think describing absence of
> special-purpose backoff & retry code for handling process creation failure by
> the OS as "bugs in application programs" is pretty arrogant and unrealistic.

Welcome to the world of UNIX.  Face it, since fork can fail because all
the proc slots are in use, if you want it to be robust, your applications
will still need to retry regardless of the fact that fork can fail for
lack of memory.  One failure is just as intermittent as the other, as far
as the application is concerned.  In one case, you would be waiting for
memory to be freed, and in the other, for a process to exit.

Steve Rago
sar@attunix.att.com

chris@mimsy.umd.edu (Chris Torek) (08/02/90)

Just for fun (well, not really :-) ), here is what the 4BSD man page says
about fork:

     [EAGAIN]	    The system-imposed limit on the total number
		    of processes under execution would be
		    exceeded.  This limit is configuration-
		    dependent.

     [EAGAIN]	    The system-imposed limit MAXUPRC
		    (<sys/param.h>) on the total number of
		    processes under execution by a single user
		    would be exceeded.

     [ENOMEM]	    There is insufficient swap space for the new
		    process.

These are the only errors listed, thus the only two that will ever
be returned (and if you believe that . . . :-) ).

Now, arguably the first two EAGAINs should be different codes:
there is a distinct difference between `We're sorry, all circuits
are busy' and `We're sorry, all your personal phones are busy'.
Indeed, if EAGAIN is to be used for both `out of system-wide
process table slots' and `too many of your own processes running',
one could argue that open should not distinguish between EMFILE
(`you personally have too many files open') and ENFILE (`everyone
put together have too many files open').

At any rate, the Berkeley fork will not fail just because the
system is temporarily short of physical memory.  (It can, however,
deadlock both in fork() and when growing a process....  Fork can
only deadlock when there is almost no real memory at all, though
[`almost none' means `USRPTSIZE + NPROC*HIGHPAGES < maxmem'].)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

rice@dg-rtp.dg.com (Brian Rice) (08/02/90)

In article <574@oglvee.UUCP>, jr@oglvee.UUCP (Jim Rosenberg) writes:
> In <1990Jul30.002642.18244@dg-rtp.dg.com> rice@dg-rtp.dg.com (Brian
Rice) writes:
> >And, of course, the kernel tells you why your fork failed: you get
EAGAIN or 
> >ENOMEM in errno.
> 
> Not clear!  In the V.3.2 man page for fork(2) it sez:
> 
> EAGAIN		Total amount of system memory available when reading via
> 		raw IO is temporarily insufficient.
> 
> ENOMEM		The process requires more space than the system is able to
> 		supply.

Yup--this is pretty murky.  The System V Interface Definition, issue 2, and
later documents are more forthcoming: EAGAIN is returned if the limits on 
total processes or processes-per-user would be exceeded, and ENOMEM is returned
if physical storage is insufficient.

I don't recall having looked carefully at the above man page section before
(I wouldn't have seen it in daily life because DG/UX's fork man page echoes 
the SVID).  It does indeed give readers the impression that their forks are 
failing because of an implementation detail (not to say "flaw") of the kernel; 
I don't know whether vanilla V.3 kernels ever actually worked that way, but 
if they did, I can certainly sympathize with those people who were unhappy 
with fork.

Anyway, if your System V conforms to SVID 2 or later, you can count on the
use of EAGAIN for configuration-limits problems and ENOMEM for physical-
storage problems.  So don't worry about that "raw IO" malarkey. 

"Be happy." -- Baba RAM Dass

Brian Rice   rice@dg-rtp.dg.com   +1 919 248-6328
DG/UX Product Assurance Engineering
Data General Corp., Research Triangle Park, N.C.