[comp.unix.wizards] UNIX-WIZARDS Digest V10#103

protin@pica.army.mil (Arthur W. Protin Jr.) (08/01/90)

There has been a lot of traffic about the desired behaviour of the
fork(), and which layer of software should be responsible for retrying.
I was moved to respond but waded thru the rest of the digest this time
before starting and found that the position I was going to propose was
already suggested by Martin Weitzel <martin@mwtech.uucp>

>		Who should retry if a fork fails? 
[stuff left out]
> So I think the complaints here *are* right from the view of an application
> developper, but instead of embracing all the forks in application programs
> with a retry capability, I think there's a more pragmatic (though not
> ideal) approach: Why not enhance the interface to fork in the standard
> library with a retry capability? For many of us, "library + kernal" are
> more or less a monolithic block (we can't change both easily 1/2:-)), so
> if an error from fork could be treated as the described "long-term" error
> condition, everything were fine.

My very thoughts on the matter.

BUT, in his intro to this solution was a detail that I over looked, and
maybe too many others have as well:

> As I understand all the traffic here, the "real" problem is in fact
> that in case of the E_AGAIN-error two very different problems may
> exist: The one is more a "long-term" problem (no slots in the process
> table or user limited reached, where this could also be zombies caused
> by careless programming techniques), the other is a very short-term
> problem, which is difficult to correct in the kernal because the complexity
> of the algorithms in that area.

TWO different situations with two different causes and requiring 
two different solutions. When my puny little application program
can't fork off because the process table is full (or even nearly so)
it not only shouldn't retry but it should shut it self down and 
free up those resouces so that the SysAdmin people have some
resources to try and fix the problem.  It is a design flaw that
EAGAIN is returned for both a serious system limit and a trivial
one.
    Thus IMHO, either a different error code should be used in the
new fork for the "ask me again real soon" condition or the kernel
should undo the fork call and simulate a sleep call to return 
to the exact spot to reissue the fork request.  But the kernel is
not returning enough information for the programmer to code for
the situtations correctly!

Arthur Protin <protin@pica.army.mil>
These are my personal views and do not reflect those of my boss
or this installation.