[mod.computers.apollo] Guard faults

holtz%cascade.carleton.cdn%ubc.CSNET@CSNET-RELAY.ARPA (Neal Holtz) (03/06/86)

In several different circumstances recently, we have received 'guard faults'
from Fortran programs.  Turns out that these are stack overflows, caused by
large (200,000+ byte) local arrays in subroutines (all allocated on the
stack, by default).  The reasons for this happening are quite understandable,
but it turns out that this is EXTREMELY hard for the typical Fortran user
to diagnose (even though they might otherwise be quite knowledgable).
It produces diagnostics entirely outside their realm of experience, and
is not even consistently repeatable (at least apparently).
Run it under the debugger, and the problem doesn't happen (the new cross
process debugging surely helps, but indirectly, as the fault will still
occur, but it is still not obvious as to why).
   
    [Side question.  One of these monsters run in a new shell will result in
     a guard fault.  Run it again immediately, and it seemingly works
     properly.   Why????]

I don't know what I'm suggesting, but I wish there was a better way.

Perhaps as little as a documentation section in the Fortran manual
as to the most common causes of guard faults, references to illegal
addresses, etc.

rees@APOLLO.UUCP (Jim Rees) (03/19/86)

    Subject: Guard faults (large local arrays in Fortran)

	[Side question.  One of these monsters run in a new shell will result in
	 a guard fault.  Run it again immediately, and it seemingly works
	 properly.   Why????]

Was this on a sr8 or a sr9 node?  We decided it was probably wrong for
guard segments to stay valid after the faulting program exits, so at sr9,
guard segments get reset whenever a program exits.

This is not a problem for those of us who use the csh, of course.
-------