[comp.unix.wizards] Who is responsible for a retry

martin@mwtech.UUCP (Martin Weitzel) (07/29/90)

In article <7885@tekgvs.LABS.TEK.COM> terryl@sail.LABS.TEK.COM writes:
[many wise words about KISS principle and the spirit of unix deleted]

But IMHO it's not quite appropriate here. I think the questions here is:

		Who should retry if a fork fails? 

To see the problem I think that we should generalize a little. Just
consider the case of disk reads for a moment. Surely, there's no
one of us who doesn't appreciate the ability of the device drivers
to issue retrys(%) if a read fails, and that an error from a read in
an application can be considered to be a permanent error.

(%: Maybe, if I were about to write a program which tests for flaky
disk blocks, I'm not so happy with kernal retries ...)

Of course, an application can choose to retry after bad reads and I've
had cases of "ill" disks, where running a program in the background
for some hours helped me to recover 100 % of "bad blocks" by patiently
retrying ... just 1 out of 100 reads or so happened to be succesfull.
On the other hand I would never embrace disk reads in "normal" programs
with a retry capability - why bother: the kernal-drivers solve the
problem in general well.

Now, why is the situation so different with "fork"?

As I understand all the traffic here, the "real" problem is in fact
that in case of the E_AGAIN-error two very different problems may
exist: The one is more a "long-term" problem (no slots in the process
table or user limited reached, where this could also be zombies caused
by careless programming techniques), the other is a very short-term
problem, which is difficult to correct in the kernal because the complexity
of the algorithms in that area.

So I think the complaints here *are* right from the view of an application
developper, but instead of embracing all the forks in application programs
with a retry capability, I think there's a more pragmatic (though not
ideal) approach: Why not enhance the interface to fork in the standard
library with a retry capability? For many of us, "library + kernal" are
more or less a monolithic block (we can't change both easily 1/2:-)), so
if an error from fork could be treated as the described "long-term" error
condition, everything were fine.

Well, only a suggestion, maybe someone will post such a piece of code
soon ...
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

boyd@necisa.ho.necisa.oz (Boyd Roberts) (07/31/90)

When fork() fails with EAGAIN it fails for a good reason.

How many lines of user-mode code does it take to code up retries?

About 20.

It would seem that there is some consensus to change the semantics of
fork() to retry.  This would break a critical interface.  System calls
do one thing, and one thing well.  A trivial addition to user-mode programs 
is what is required, and NOT the re-definition of a well defined critical
interface.

Leave the kernel and C library alone.  Write your own re-trying fork().

    NAME

	bork() - spawn new process with exponential backoff on failure

    SYNOPSIS

	int bork(retries)
	int retries; 

    ...


Boyd Roberts			boyd@necisa.ho.necisa.oz.au

``When the going gets wierd, the weird turn pro...''

jfh@rpp386.cactus.org (John F. Haugh II) (08/01/90)

In article <1809@necisa.ho.necisa.oz> boyd@necisa.ho.necisa.oz (Boyd Roberts) writes:
>When fork() fails with EAGAIN it fails for a good reason.
>
>How many lines of user-mode code does it take to code up retries?
>
>About 20.
>
>It would seem that there is some consensus to change the semantics of
>fork() to retry.  This would break a critical interface.  System calls
>do one thing, and one thing well.

fork() never failed before for lack of kernel memory - the change in
behavior was recently introduced - this is what people are complaining
about.

[ Gross simplification follows ... ]

The previous behavior was allocate memory for the child, and if the
allocation failed, swap the entire image.  But the process image was
left in memory, which had the effect of creating two copies of the
process - one in memory and one on the swapper.

So, even when there was a shortage of kernel memory, the process would
be created, even though it was swapped out.  This isn't some new
function people want - it's the old, pre-V.4 function people want -
you know, the same stuff that's been around since, say, 6th Edition?
-- 
John F. Haugh II                             UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 832-8832                           Domain: jfh@rpp386.cactus.org

terryl@sail.LABS.TEK.COM (08/02/90)

In article <866@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes:
>In article <7885@tekgvs.LABS.TEK.COM> terryl@sail.LABS.TEK.COM writes:
>[many wise words about KISS principle and the spirit of unix deleted]
>
>But IMHO it's not quite appropriate here. I think the questions here is:
>
>		Who should retry if a fork fails? 

     Well, I'll agree to disagree!!! (-: IMHO, the KISS principle and the
spirit of unix IS appropriate here, and you deleted one of my main reasons
why with respect to your question "Who should retry if a fork fails?"

     And the reason is one of policy; as I said in my previous post, the
kernel SHOULD NOT be making policy decisions that are better left to the
application program. Having the kernel ALWAYS sleeping and retrying if a
fork() fails is a policy decision that may not always be appropriate.

     However, many people have brought up a point that I didn't think about
at first; the error code EAGAIN is overloaded to mean two totally different
things: one, a transient condition of not enough system resources(i.e. disk
swap space, real memory, etc.) to satisfy the request, and two, a more perma-
nent condition of running into some system-wide limits (i.e. no more process
slots, too many processes for this user id, etc.), and I'll be happy to agree
(-: on this point that the two error conditions should be signaled differently.


			Terry Laskodi
			     of
			Tektronix

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/02/90)

In article <18479@rpp386.cactus.org> jfh@rpp386.cactus.org (John F. Haugh II) writes:
>fork() never failed before for lack of kernel memory - the change in
>behavior was recently introduced - this is what people are complaining
>about.

Actually, there have been several different implementations of fork(),
all of which have been able to fail for reasons of a transient shortage
of resources.  The details are the only thing that have varied..

lls@kings.co.uk (Lady Lodge Systems) (08/02/90)

In article <1809@necisa.ho.necisa.oz> boyd@necisa.ho.necisa.oz (Boyd Roberts) writes:
>When fork() fails with EAGAIN it fails for a good reason.
>A trivial addition to user-mode programs is what is required, and NOT 
>the re-definition of a well defined critical interface.
>Leave the kernel and C library alone.  Write your own re-trying fork().

OK! Napalm dispenser armed.....

How about AT&T making the 'trivial' addition to their user mode programs?
The shell springs to mind! What are you suggesting? I make the trivial
addition of rewriting the shell?? How about the other programs that
I only have in binary form - rewrite them as well??

People are missing the whole point here.  The vast majority of Un*x
users are binary only licencees and cannot easily change their code to
deal with EAGAIN returns in all of their (binary only) programs.  When
fork fails through a temporary log-jam in the kernel surely the kernel
should deal with the situation before telling the calling application
that the world has just ended?  If fork fails because the process table
is full we can fix that by retuning for sensible limits - WE CAN'T FIX
THE KERNEL GETTING ITS INTERNALS SNARLED UP

The code that I write does check for EAGAIN and retry - however, other
utilities and packages that I call (including /bin/sh) do not - how do 
I fix that?
-- 
 -------------------------------------------------------------------------
| Peter King.                          | Voice: +44 733 239445            |
| King Bros (Lady Lodge) Ltd           | EMAIL: root@kings.co.uk          |
| Lady Lodge House, Orton Longueville, | UUCP: ...mcvax!ukc!kings!root    |
| Peterborough, PE2 0HZ, England.      |                                  |
 -------------------------------------------------------------------------

jfh@rpp386.cactus.org (John F. Haugh II) (08/03/90)

In article <13468@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
>Actually, there have been several different implementations of fork(),
>all of which have been able to fail for reasons of a transient shortage
>of resources.  The details are the only thing that have varied..

I've been spared the gory details of System V fork() most of my adult
life, but in my childhood (last week when this started) I did check
V6 and V7, and sure enough - neither of those fail for lack of
anything short of swap space.  Now, I have V.2 source laying around
work someplace, I'd be happy to look and see what fork() does in
the case of no physical memory there.  But my recollection is, it
doesn't fail for transient shortages of physical memory.

Anyone with a fork() older than V.3 which fails for transient
shortages of memory is free to send me the sordid details.  That way
I can start a list of which UNIX products not to purchase.  Obviously
the disease starts about V.3.2 or so and I'd rather avoid the plague.
-- 
John F. Haugh II                             UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 832-8832                           Domain: jfh@rpp386.cactus.org

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/05/90)

In article <35@kings.co.uk> lls@kings.UUCP (Superuser) writes:
>The shell springs to mind! What are you suggesting? I make the trivial
>addition of rewriting the shell??

If you had been following the discussion, you should have already heard
that the shell DOES retry failed forks, several times with exponentially
increasing delay between them.

>People are missing the whole point here.  The vast majority of Un*x
>users are binary only licencees and cannot easily change their code to
>deal with EAGAIN returns in all of their (binary only) programs.

If you have a problem with a program you've licensed in binary form,
you should submit a bug report to the vendor of the program.

boyd@necisa.ho.necisa.oz (Boyd Roberts) (08/08/90)

In article <35@kings.co.uk> lls@kings.UUCP (Superuser) writes:
>
>The code that I write does check for EAGAIN and retry - however, other
>utilities and packages that I call (including /bin/sh) do not - how do 
>I fix that?

I've just read the the System V.2.2 shell code and guess what?
It retries the fork() with backoff.  So your shell must be broken.

The deal is that when fork() fails -- all is not well.  The kernel is
not psychic and can't predict what will happen in the future.  The
caller of fork() is responsible to take action.

When fork() does fail what can you do?  Not much, you can give up
in disgust or retry.  BUT THERE IS NO GUARANTEE THAT THE ANY AMOUNT
OF RETRYING WILL GET YOU THAT NEW PROCESS.  What's happened is that a
kernel resource has been exhausted and you have no way of predicting
whether any relevant resources in use will be freed up in the future.

I'm with presotto & hume.  But when you're faced with retrying you have
to ask the question -- is it worth it, and if so how long should I try?


Boyd Roberts			boyd@necisa.ho.necisa.oz.au

``When the going gets wierd, the weird turn pro...''

blm@6sceng.UUCP (Brian Matthews) (08/09/90)

In article <1820@necisa.ho.necisa.oz> boyd@necisa.ho.necisa.oz (Boyd Roberts) writes:
|When fork() does fail what can you do?  Not much, you can give up
|in disgust or retry.  BUT THERE IS NO GUARANTEE THAT THE ANY AMOUNT
|OF RETRYING WILL GET YOU THAT NEW PROCESS.

On the other hand, if you don't retry, you're pretty much guarenteed not
to get a new process.
-- 
Brian L. Matthews	blm@6sceng.UUCP