[comp.sys.sun] Compatibility problem?

onders@taac.ipl.rpi.edu (Timothy E. Onders) (08/07/90)

I've got a mysterious problem.  I've taken the following program:

main()
{
	int r, c;
	int testm[1024][1024];

	r = 0;
	c = 0;
	for (r = 0 ; r < 1024 ; r++)
	{
		for (c = 0 ; c < 1024 ; c++)
		{
			testm[c][r] = c * 10000 + r;
		}
	}
}

It compiles on both a Sun 3, and a Sun 4.  When run on a Sun 4, it does
fine.  When run on the Sun 3, attempting to write to the array gives a
SEGV.  Attempting to read the array from dbx gives a "bad address" error.
by examining the addresses of the three variables, there seems to be
enough space for everything allocated. There is 4 megs of space between
the array starting address, and the address of r.  Any ideas what might be
causing this problem, and how I can get around it?

					-Tim Onders
					onders@ipl.rpi.edu

als@roxanne.mlb.semi.harris.com (Alan Sparks) (08/08/90)

In article <1990Aug7.003432.1984@rice.edu> onders@taac.ipl.rpi.edu (Timothy E. Onders) writes:

> [program omitted]
> When run on the Sun 3, attempting to write to the array gives a
>SEGV.  Attempting to read the array from dbx gives a "bad address" error.
>by examining the addresses of the three variables, there seems to be
>enough space for everything allocated. There is 4 megs of space between
>the array starting address, and the address of r.  Any ideas what might be
>causing this problem, and how I can get around it?

I haven't taken much time to research this really well... but I suspect
you've blown out the stack on the 68xxx CPU (especially with such a mondo
array).  I duplicated your situation, then devised a couple of
workarounds.

One workaround is to make the array "testm" static:

	static int testm[1024][1024];

This moves it off the stack, into the static data area.  Another
workaround (especially if you want to reclaim storage) is to dynamically
allocate the array.  One way to do it is:

	int **testm, i;

        testm = (int **) calloc(1024,sizeof(int *));
        for (i = 0;  i < 1024;  ++i)
          testm[i] = (int *) calloc(1024,sizeof(int));

The remainder of the code stays the same.  To reclaim storage afterward:

	for (i = 0;  i < 1024;  ++i)
          cfree(testm[i]);
        cfree(testm);

Some variant of these workarounds will solve your problem.  They work on a
local Sun 3/60 here.

Hope this helps.
-Alan

beau@uunet.uu.net (Beau James) (08/08/90)

Your automatic array is "too big" for the kernel heuristic that decides
when it's time to grow the stack, vs. when to declare a user program
error.  That heurisitc is different on Sun-3s and Sun-4s, since stack
frames tend to b different sizes on those systems.  (The heuristic may
also change from one SunOS release to another.)  This isn't really a SunOS
issue, though; the behavio of most *nixs is similar.

Unix user programs never do anything to explicitly manage the growth of
their stack.  If the program makes a reference beyond the end of currently
allocated stack [virtual] memory, the hardware traps.  The kernel looks to
see "how far" the trapped reference was beyond the end of the existing
stack; if it was "close enough", the kernel decides that the program was
really just trying to grow the stack, so it allocates additional [virtual]
memory for the stack and reruns the instruction that caused the trap -
much like a standard VM page fault.  On the other hand, if the reference
is "too far" past the stack, the kernel decides that it was indeed an
invalid reference, and sends the process a SIGSEGV.

As a general rule, this means it's a bad idea to make "big" objects
automatic.  Better to have an automatic pointer to the object, and
malloc() it on the fly; or to make the object static.  That approach is
more portable, also, since the precise definitions of "big", "too far",
etc. are very system (hardware and OS version) dependent.

Beau James				beau@Ultra.COM
Ultra Network Technologies		{sun,ames}!ultra.com!beau

guy@uunet.uu.net (Guy Harris) (08/10/90)

>Your automatic array is "too big" for the kernel heuristic that decides
>when it's time to grow the stack, vs. when to declare a user program
>error.  That heurisitc is different on Sun-3s and Sun-4s, since stack
>frames tend to b different sizes on those systems.

No, the heuristic is essentially the same; the parameter the heuristic
uses, namely the stack limit, is different - it defaults to 2MB on a Sun-3
(and probably a Sun-2), and 8MB on a SPARC.  His program worked just fine
on a Sun-3 (well, an NS5000, but it *does* have a 3E120 inside it, running
4.0.3) after I did "limit stacksize 8192k" from the C shell to boost the
stack limit to 8MB (can't set it in the SunOS Bourne shell; the Korn and
Bourne-again shells may let you set it).