protin@pica.army.mil (Arthur W. Protin Jr.) (08/01/90)
There has been a lot of traffic about the desired behaviour of the fork(), and which layer of software should be responsible for retrying. I was moved to respond but waded thru the rest of the digest this time before starting and found that the position I was going to propose was already suggested by Martin Weitzel <martin@mwtech.uucp> > Who should retry if a fork fails? [stuff left out] > So I think the complaints here *are* right from the view of an application > developper, but instead of embracing all the forks in application programs > with a retry capability, I think there's a more pragmatic (though not > ideal) approach: Why not enhance the interface to fork in the standard > library with a retry capability? For many of us, "library + kernal" are > more or less a monolithic block (we can't change both easily 1/2:-)), so > if an error from fork could be treated as the described "long-term" error > condition, everything were fine. My very thoughts on the matter. BUT, in his intro to this solution was a detail that I over looked, and maybe too many others have as well: > As I understand all the traffic here, the "real" problem is in fact > that in case of the E_AGAIN-error two very different problems may > exist: The one is more a "long-term" problem (no slots in the process > table or user limited reached, where this could also be zombies caused > by careless programming techniques), the other is a very short-term > problem, which is difficult to correct in the kernal because the complexity > of the algorithms in that area. TWO different situations with two different causes and requiring two different solutions. When my puny little application program can't fork off because the process table is full (or even nearly so) it not only shouldn't retry but it should shut it self down and free up those resouces so that the SysAdmin people have some resources to try and fix the problem. It is a design flaw that EAGAIN is returned for both a serious system limit and a trivial one. Thus IMHO, either a different error code should be used in the new fork for the "ask me again real soon" condition or the kernel should undo the fork call and simulate a sleep call to return to the exact spot to reissue the fork request. But the kernel is not returning enough information for the programmer to code for the situtations correctly! Arthur Protin <protin@pica.army.mil> These are my personal views and do not reflect those of my boss or this installation.