[comp.sys.apollo] GUARD FAULT ?

conrad@tlc.tlc.com (Conrad Dost) (12/13/89)

Does anyone know what a GUARD FAULT is ?.  I got it in a deeply recursive
routine on an DN-4000, SR 0.1, bsd4.2.  The same program runs ok on a SUN.
-- 
- Conrad Dost, Total Logic Corp.
  12 South 1st Street, #808, San Jose, CA 95113 USA, (408)295-1792
  conrad@tlc.com, {apple!motcsd | key | uunet!loki}!tlc!conrad

derstad@CIM-VAX.HONEYWELL.COM ("DAVE ERSTAD") (12/13/89)

>    Does anyone know what a GUARD FAULT is ?.  I got it in a deeply recursive
>    routine on an DN-4000, SR 0.1, bsd4.2.  The same program runs ok on a SUN.

Too much junk on the stack, usually.  Another message you might get is

    unable to unwind stack because of invalid stack frame
                                                                                      
Generally, I have found the "guard fault" is issued when procedures have
large local storage, and the "invalid stack frame" when the recursion
is very, very deep (for example, 10,000 deep).

If you can move some data items which are local to the recursive routine into
a global area, or mark them as STATIC that will help.  

The maximum stack size is 262144 bytes.  This means you can call a procedure
with about a quarter meg of local storage, or recurse twice on a procedure
with about 0.12 MB, recurse three times on 0.08 MB, etc.

I have no idea where the limit comes from.  It seems pretty silly to have
64 MB or whatever of address space and limit applications to such a small
stack space.  Maybe someone at Apollo can shed some light?...

Dave Erstad
Honeywell SSEC
DERSTAD@cim-vax.honeywell.com

krowitz%richter@UMIX.CC.UMICH.EDU (David Krowitz) (12/13/89)

A guard fault means you overran the end of the stack space.
I believe Apollo puts a write protected page of memory at
the end of the stack space to "guard" against programs
using more stack space than is available, thus causing
a memory page fault when you try to call a recursive routine
one time too many. You can increase the default stack size
with a switch to the binder (at least on SR10.2).


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter.mit.edu@eddie.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

oj@apollo.HP.COM (Ellis Oliver Jones) (12/14/89)

In article <8912131524.AA17942@richter.mit.edu> krowitz%richter@UMIX.CC.UMICH.EDU (David Krowitz) writes:
>A guard fault means you overran the end of the stack space.

Yes. If you use the "las" ("list-address-space") utility
in /usr/apollo/bin/las  (Not Just Like Real Unix) on a typical process,
you'll get something like this (349 is the Unix PID, and this is SR10.2,
but "las" in some form works on all SRs).  <area nnn> means a hunk of 
virtual memory which is not associated with a file.  The change which allowed
the Prism to take 0.5 meg per process instead of 

% las 349
       VA Range    Obj Start   Pathname
    8000 -     FFFF        0   <area 157>
   10000 -    17FFF        0   <area 164>
3B2A0000 - 3B2AFFFF 7FFF0000   <area 159>
3B2D0000 - 3B2EFFFF 7FFE0000   <area 176>
3B340000 - 3B377FFF        0   /sys/node_data/systmp/dm_mbx
3B378000 - 3B37FFFF 7FFF8000   <area 163>
3B380000 - 3B3C7FFF 7FFB8000   <area 142> <---stack pointer in here.
3B3C8000 - 3B3CFFFF        0   /sys/node_data/systmp/stack_guard_file 
3B3D0000 - 3B3F7FFF        0   <area 173>
 . . .
3B528000 - 3B537FFF        0   /lib/kslib
3B538000 - 3B54FFFF        0   /lib/rgylib
3B550000 - 3B55FFFF        0   /lib/ddslib
3B560000 - 3B56FFFF        0   /lib/ftnlib
 . . .
4672 KB mapped.

/Ollie Jones (speaking for myself, not necessarily for HP Apollo)

lampi@pnet02.gryphon.com (Michael Lampi) (12/15/89)

You've probably run out of stack space. You can extend the amount of stack
allocated by your application by rebinding (relinking) and specifying that
you want more than the default (256k?). The exact syntax is mentioned in the
Aegis bind command---ok, so it's not bsd4.2, but it's the best I can do!

Michael Lampi               MDL Corporation   213/782-7888   fax 213/782-7927

UUCP: {ames!elroy, <routing site>}!gryphon!pnet02!lampi
INET: lampi@pnet02.gryphon.com
"My opinions are that of my corporation!"

dbfunk@icaen.uiowa.edu (David B Funk) (12/15/89)

WRT posting: <923@tlc.tlc.com>
> Does anyone know what a GUARD FAULT is ?.  I got it in a deeply recursive
> routine on an DN-4000, SR 0.1, bsd4.2.  The same program runs ok on a SUN.
-- 

    A Guard Fault is the error that you get when your program runs off
the end of its stack segment. IE runs out of stack space.
Apollo uses a "guard segment" to implement its stack overflow trapping.
This is a small (32Kb) segment that is put in memory just after the
stack segment. Its MMU atributes are set to trap on any access.
So when you program goes walking off the end of its stack, it runs into
the guard segment and triggers a guard fault trap.
    It is possible to run off the stack and not trigger a guard fault.
If a program goes "walking" down the stack in small steps (less than
32Kb) then it will trigger a guard fault when it hits the end. Such as
the case of a overly deep recursive program with moderate amounts of
local storage. But if the process has large amounts of local storage
(>64Kb) so that it goes down the stack in large steps, it is possible
to "step over" the guard segment and mess up other things. This can
result in errors like "segmentation fault", "access violation" or the
dreaded "unable to unwind stack" (really FUBAR).
    In any case, you need more stack space. The system allocates a 256Kb
stack by default (Do a man/help on "limits"). This is considered big
enough for general purposes but not so large that it will eat up lots
of disk space for each active process. (On the DN10k under sr10.0.p the
default stack size was 5Mb and people started screaming when the found
that they lost 5Mbytes of disk each time another process started up.)
There is an option to the loader (binder) that can be used to request
a larger stack when making your program.

Dave Funk