[comp.sources.bugs] dmake 3.7 bug: loses child on 3b1

kevin@kosman.UUCP (Kevin O'Gorman) (06/06/91)

I snarfed dmake 3.7 by ftp, and I'm working to install it on my 3b1.
The bootstrap make seemed to go fine.  The resulting object seems
normal.  I did minimal edits: just to startup.h.  I did 'make sysvr3'.

I installed the result.

Now I'm trying to use dmake to make itself again: the acid test.

It goes okay for several modules, and then at a certain point, it
reports "Error -- lost a child" and dies.  If I repeat the command,
one additional source is compiled, and dmake dies in the same way
on the second.

This is no way to make a program.

The natural questions arise: has anyone seen this before?  Has anyone
had success on a 3b1?  Does anyone have any hints?  Does anyone have
any idea what the message means? (I've tried grepping the source,
but don't see that message anywhere).

-- 
Kevin O'Gorman ( kevin@kosman.UUCP, kevin%kosman.uucp@nrc.com )
voice: 805-984-8042 Vital Computer Systems, 5115 Beachcomber, Oxnard, CA  93035
Non-Disclaimer: my boss is me, and he stands behind everything I say.

lbr@holos0.uucp (Len Reed) (06/06/91)

In article <1362@kosman.UUCP> kevin@kosman.UUCP (Kevin O'Gorman) writes:

=It goes okay for several modules, and then at a certain point, it
=reports "Error -- lost a child" and dies.  If I repeat the command,
=one additional source is compiled, and dmake dies in the same way
=on the second.

I've seen this on SCO Xenix 386.
-- 
Len Reed
Holos Software, Inc.
Voice: (404) 496-1358
UUCP: ...!gatech!holos0!lbr

dvadura@watdragon.waterloo.edu (Dennis Vadura) (06/11/91)

In article <1362@kosman.UUCP> kevin@kosman.UUCP (Kevin O'Gorman) writes:
>It goes okay for several modules, and then at a certain point, it
>reports "Error -- lost a child" and dies.  If I repeat the command,
>one additional source is compiled, and dmake dies in the same way
>on the second.

I tried to respond to you by mail but it bounced for some reason.  The error
message is printed from the sys_errlist table.  I currently don't have a fix
for this situation, and can't reproduce it on anything that I have tried.

-dennis
-- 
-------------------------------------------------------------------------------
   "It may not be the truth, but in Baghdad,       |Dennis Vadura
    it is the News.  --Unknown Gulf Correspondent  |dvadura@dragon.uwaterloo.ca
===============================================================================

haug@almira.uucp (Brian R. Haug) (06/13/91)

In article <1362@kosman.UUCP> kevin@kosman.UUCP (Kevin O'Gorman) writes:
>I snarfed dmake 3.7 by ftp, and I'm working to install it on my 3b1.
>The bootstrap make seemed to go fine.  The resulting object seems
>normal.  I did minimal edits: just to startup.h.  I did 'make sysvr3'.
>
>I installed the result.
>
>Now I'm trying to use dmake to make itself again: the acid test.
>
>It goes okay for several modules, and then at a certain point, it
>reports "Error -- lost a child" and dies.  If I repeat the command,
>
>The natural questions arise: has anyone seen this before?  Has anyone
>had success on a 3b1?  Does anyone have any hints?  Does anyone have
>any idea what the message means? (I've tried grepping the source,
>but don't see that message anywhere).

I saw this behavior as well.  After some examination and code modifications
I found out that the wait call was failing with the error that there were
no children.  Additional debug code showed the process number of forked
children and the value returned by wait.  When the error occurred, there had
been no waits for the child in question.  I used adb to set a breakpoint at
wait, and found that it was being called from getcwd(3c).  It seems that
many systems implement this subroutine as a call to popen(3C) followed by some
sort of read and then a pclose(3c).  Pclose has to do a wait, which may get
the child that dmake will later be searching for.

As best I can tell, this can not be easily fixed in any System V release (until
V.4 when we get waitpid) unless you re-write the getcwd function, or the dmake
function which calls getcwd.  Best of luck.

pcg@aber.ac.uk (Piercarlo Grandi) (06/15/91)

On 13 Jun 91 01:00:35 GMT, haug@almira.uucp (Brian R. Haug) said:

haug> [ ... dmake calls getcwd(3) while it has children outstanding;
haug> since in many systems getcwd(3) just forks pwd(1), this makes
haug> for problems ... ]

haug> As best I can tell, this can not be easily fixed in any System V
haug> release (until V.4 when we get waitpid) unless you re-write the
haug> getcwd function, or the dmake function which calls getcwd.

There is fairly clever freeware implementation of getcwd(3) going
around, one version of which has been done by Doug Gwyn. This does not
call pwd(1), and solves the problem.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk

dvadura@watdragon.waterloo.edu (Dennis Vadura) (06/20/91)

In article <1991Jun13.010035.16040x@almira.uucp> haug@ColumbiaSC.NCR.COM (Brian R. Haug) writes:
>As best I can tell, this can not be easily fixed in any System V release (until
>V.4 when we get waitpid) unless you re-write the getcwd function, or the dmake
>function which calls getcwd.  Best of luck.
Many thanks to Brian for finding this bug.  It's really hard to for me to
get to a machine that exhibits the above behaviour.

Does anyone have a getcwd for Sys V that doesn't rely on forking and invoking
pwd.  I'd like to include the fix in the next patch (which I have been
promissing for a while and keep delaying due to this problem).

-dennis
-- 
-------------------------------------------------------------------------------
 Sometimes fate needs a good kick in the   |Dennis Vadura
 butt to get it going.                     |dvadura@dragon.uwaterloo.ca
===============================================================================

les@chinet.chi.il.us (Leslie Mikesell) (06/21/91)

In article <1991Jun20.133732.1559@watdragon.waterloo.edu> dvadura@watdragon.waterloo.edu (Dennis Vadura) writes:

>Does anyone have a getcwd for Sys V that doesn't rely on forking and invoking
>pwd.  I'd like to include the fix in the next patch (which I have been
>promissing for a while and keep delaying due to this problem).

The machines that have this problem generally do not have symlinks, so
there should never be any surprises from getcwd().  Can't you just
pick up your starting directory before doing any work and track your
own chdir()s relative to that?

Les Mikesell
  les@chinet.chi.il.us