[comp.sys.apollo] System dying

system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) (10/30/90)

In article <9010292020.AA26213@richter.mit.edu> krowitz@RICHTER.MIT.EDU (David Krowitz) writes:
>Well, I'm trying to building the X11 R4 sources on our DN3000's, 3500's, and
>2500's without much luck.
>  [ ... notes about hung nested makes deleted ... ]
>After a while, the machine will eventually crash.
>Has anyone seen this sort of behaviour? Do you know how to get around it?

I have seen this sort of behaviour whenever a script/program is run that
creates lots (i.e. hundreds or thousands) of subshells very quickly -
the problem is worse when the shells get more deeply nested.
This problem was insurmountable at SR10.0, surmountable at SR10.1 by
running such tasks immediately after rebooting and rebooting immediately
afterwards (otherwise the system would just hang shortly anyways),
and much better under SR10.2 but not perfect (I still use the same workaround
as for SR10.1, though it is pain in the ass to have to take down a
DN10020 and lose all the active jobs just because the Apollo's can't fork a
shell properly).

We have this problem whenever I run the node protection scripts
which run several thousand subshells and nest about 3 deep, or when I
compile the NCAR graphics software which is 4-5 deep nested makes.
-- 
Mike Peterson, System Administrator, U/Toronto Department of Chemistry
E-mail: system@alchemy.chem.utoronto.ca
Tel: (416) 978-7094                  Fax: (416) 978-8775

oliveria@srvr1 (ROQUE DONIZETE DE OLIVEIRA) (10/30/90)

From article <1990Oct29.223226.9532@alchemy.chem.utoronto.ca>, by system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)):
> In article <9010292020.AA26213@richter.mit.edu> krowitz@RICHTER.MIT.EDU (David Krowitz) writes:
>>Well, I'm trying to building the X11 R4 sources on our DN3000's, 3500's, and
>>2500's without much luck.
>>  [ ... notes about hung nested makes deleted ... ]
>>After a while, the machine will eventually crash.
>>Has anyone seen this sort of behaviour? Do you know how to get around it?
> 
> I have seen this sort of behaviour whenever a script/program is run that
> creates lots (i.e. hundreds or thousands) of subshells very quickly -
> the problem is worse when the shells get more deeply nested.
> This problem was insurmountable at SR10.0, surmountable at SR10.1 by
> running such tasks immediately after rebooting and rebooting immediately
> afterwards (otherwise the system would just hang shortly anyways),
> and much better under SR10.2 but not perfect (I still use the same workaround
> as for SR10.1, though it is pain in the ass to have to take down a
> DN10020 and lose all the active jobs just because the Apollo's can't fork a
> shell properly).
> 
> We have this problem whenever I run the node protection scripts
> which run several thousand subshells and nest about 3 deep, or when I
> compile the NCAR graphics software which is 4-5 deep nested makes.
> -- 
> Mike Peterson, System Administrator, U/Toronto Department of Chemistry
> E-mail: system@alchemy.chem.utoronto.ca
> Tel: (416) 978-7094                  Fax: (416) 978-8775

We had the same problem (make crashing due to too many (4 or 5) deep nested
makes) when installing NCAR graphics. The solution was to modify some rules,
by adding a "wait" statement. Example:

all::
	@for dir in $(SUBDIRS) ; do\
	(cd $$dir; echo "Making $$dir";\
	$(MAKE) $(MFLAGS) ; wait );\
	done

  Roque Oliveira
  oliveria@caen.engin.umich.edu