[net.lang.c] C stack frame sizes

whm@arizona.UUCP (Bill Mitchell) (11/30/84)

Has anyone ever empirically studied the sizes of C stack frames for UNIX
applications?  The particular question I have is:  How serious would a
65kbyte maximum frame size be for a 32 bit machine.  I don't think I've
ever seen a function that would come close to this limit, but I really
don't have a good feel for how reachable this limit would be in the
majority of cases.

					Bill Mitchell
					whm.arizona@csnet-relay
					{noao,mcnc,utah-cs}!arizona!whm

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/01/84)

> Has anyone ever empirically studied the sizes of C stack frames for UNIX
> applications?  The particular question I have is:  How serious would a
> 65kbyte maximum frame size be for a 32 bit machine.

I have recently had some practical experience with this issue.  On the
Gould UTX-32 system (and I believe on IBM 370s too), the architecture
rather strongly encouraged the C/UNIX implementors to use a relatively
small chunk of address space for the run-time stack.  The exact amount
can be adjusted when one link-edits a binary executable image; using
around 24Kb worked well for nearly all UNIX System V utilities.  64Kb
would probably suffice for almost all applications; those having trouble
most likely could be fixed by changing large "auto" arrays to "static".
I have yet to see a practical example where a huge amount of recursion
was required; however, if one turned up, it would not be easy to fix.

So a SINGLE stack frame limit of 64Kb would certainly not be very
troublesome.  I imagine someone has written code that needs more, but
they have already severely limited their target machine choices.

grunwald@uiucdcsb.UUCP (12/02/84)

/* Written  9:27 pm  Nov 30, 1984 by malcolm@ecn-ee in uiucdcsb:net.lang.c */
...

I wonder what this type of programming style would do to a Berkeley style
RISC machine?

								Malcolm
/* End of text from uiucdcsb:net.lang.c */

On the pryamid, which is essentially a Berkeley style RISC machine, the
array would be stored in real memory, not registers. Same goes for structs
and (I think) unions.
   They only keep the "normal stuff" in registers. So, if you defined:

	int *i, j[1000], *k;

both "i" and "k" would be in registers and j would be in memory.

malcolm@ecn-ee.UUCP (12/02/84)

I commonly put up to a megabyte into a single stack frame.  I write heavy
duty number crunching for a living and can't live without C's modularity
(compared to Fortrash.)

I find it is much more elegant to put all of my temporary arrays on
the stack so that they are hidden from other routines and they don't take
up memory space except when the routine is active.  This is especially
a win for rarely called routines.  Of course I could use malloc but this
would be a real pain in the a**.

I wonder what this type of programming style would do to a Berkeley style
RISC machine?

								Malcolm

henry@utzoo.UUCP (Henry Spencer) (12/03/84)

> I commonly put up to a megabyte into a single stack frame. ...
> 
> I wonder what this type of programming style would do to a Berkeley style
> RISC machine?

Not much.  RISC compilers are supposed to display some intelligence about
data structures that are obviously too big to fit in the register bank.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

henry@utzoo.UUCP (Henry Spencer) (12/03/84)

I'm told that the total stack use by practically all V7 programs is tiny,
a few K at most.  The major exception is the Ritchie cc, which uses a
recursive-descent parser.  Many things have grown some since V7, of course...
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

chris@umcp-cs.UUCP (Chris Torek) (12/04/84)

[Before I begin, I have to add a disclaimer that the information given
here was obtained almost entirely by examining the output of the C
compiler on one particular Pyramid 90-X.  Nothing stated herein should
be taken as official.  End disclaimer.]

The Pyramid is a wonderful machine for exercising ``portable'' C code!
Not only are unions and structs kept only on the stack, the evaluation
order for subroutine calls is different.

First, here's a little bit of information about the hardware
architecture.  There are always 64 registers avaliable at any point in
code.  They are divided into four groups of sixteen registers.  Four
registers in each group are used for special things, so you really wind
up with 12 registers in each group.

-----------------------------------------------------------------

			proc A
			-------
			| pr0 |
			-------
			| ... |
			-------
			| pr15|
			-------
			| lr0 |
			-------
			| ... |
			-------
			| lr15| proc B
			------- -------
			| tr0 | | pr0 |
			------- -------
			| ... | | ... |
			------- -------
			| tr15| | pr15|
			------- -------
				| ... |
				| ... |
				-------
				| tr0 |
				-------
				| ... |
				-------
				| tr15|
				-------

	     Register mapping: proc A calls proc B
			    Figure 1.
-----------------------------------------------------------------

The registers are named by a group description and a number.  The four
groups are "global", "parameter", "local", and "transfer".  The 16
(really 12) global registers, gr0 through gr15 (gr11), are the same in
every procedure; the others are windowed, with some hardware and
software taking care of overflows when calling and returning.  Transfer
registers are used to pass parameters (where they become parameter
registers); parameter registers are used for local parameters and
return values; and local registers are used as in conventional
architectures.  See Figure 1 for a pictorial representation of
parameter, local, and transfer registers.

As shown in Figure 1, a called procedure's registers (the parameter
registers for proc B) and the calling procedure's registers (the
transfer registers for proc A) overlap, allowing the "transfer" of
"parameters" through registers (clever, no?).  Now, of course, only
fixed-size units can be passed via registers; when arrays or other
"large" parameters are passed, conventional mechanisms are used.
(There are no special cases for "small" structures in the present
implementation of the C compiler.)

Now, this presents another problem: what if more than 12 parameters
are passed?  Again, the Pyramid compiler uses conventional mechanisms
-- but only when the first 12 registers are used up.  This means
that if you compile the code

	f() {
		g(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15);
	}

the last four arguments (12 through 15) are passed on the stack, with
the first 12 (0 through 11) being in the corresponding transfer/parameter
register.

Since the transfer registers are also used as temporary registers when
evaluating expressions, the compiler naturally evaluates parameters
which will be placed on the stack before those that will go into
registers.  As with the Vax compilers, these are evaluated in reverse
order (right to left), so that each can be pushed onto the stack as
soon as the actual value is known.  However, the compiler evaluates the
first 12 parameters left to right!

In other words, the Pyramid compiler evaluates arguments from the
outside in.

----------

By the way, this reminds me of a bug we discovered: if you have a
C function that returns a value which is an expression involving the
first parameter, the compiler accidently clobbers the parameter
before completing the evaluation.  That is, since pr0 (the caller's
tr0) is used hold the return value, someone decided to make it
available for temporary storage, resulting in code like

	get pr0
	mask with 0xff
	shift left 8 bits
	store in pr0		# oops!
	get pr0
	mask with 0xff00
	shift right 8 bits
	or into pr0
	return

for

	return ((x & 0xff) << 8) | ((x & 0xff00) >> 8);

This is not good, to say the least.

----------

[In case you've read this far: we (Maryland) have an info-pyramid mailing
list to discuss anything to do with Pyramid machines; if you'd like to
be added to the list, send mail to info-pyramid-request@MARYLAND.ARPA,
info-pyramid-request@umcp-cs.CSNet, or seismo!umcp-cs!info-pyramid-request.]
-- 
(This line accidently left nonblank.)

In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

mjl@ritcv.UUCP (Mike Lutz) (12/04/84)

>From: ecn-ee!malcolm    Nov 30 21:27:00 1984

>I commonly put up to a megabyte into a single stack frame...I find
>it is much more elegant to put all of my temporary arrays on
>the stack...I wonder what this type of programming style would do to a
>Berkeley style RISC machine?

Probably wouldn't have much effect, as the philosophy of RISC is to
hold scalars in the register bank.  However, it does point up an
interesting problem -- does the Berkeley RISC require a parallel stack,
maintained by software protocols, to hold structures, arrays, and other
humongous local variables?

-- 
Mike Lutz	Rochester Institute of Technology, Rochester NY
UUCP:		{allegra,seismo}!rochester!ritcv!mjl
ARPA:		ritcv!mjl@Rochester.ARPA

henry@utzoo.UUCP (Henry Spencer) (12/06/84)

> ... does the Berkeley RISC require a parallel stack,
> maintained by software protocols, to hold structures, arrays, and other
> humongous local variables?

The RISC needs a memory stack anyway, because of the possibility that the
"register stack" will overflow.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

mjl@ritcv.UUCP (Mike Lutz) (12/10/84)

>> ... does the Berkeley RISC require a parallel stack,
>> maintained by software protocols, to hold structures, arrays, and other
>> humongous local variables?
>
>The RISC needs a memory stack anyway, because of the possibility that the
>"register stack" will overflow.

True, but the memory overflow stack is basically a 'shadow' of the
register stack, or, alternatively, one can view the register stack as a
cache for the most recently activated procedures.  The original RISC-I
papers from Berkeley imply this when discussing the software handlers
for register stack overflow/underflow.  Note that the plan to have the
registers addressable as memory locations agrees with the shadow/cache
model.

However, since the idea the register stack is designed for scalars and
pointers, I still think there is a need for a separate stack for large,
automatic, data structures - no?
-- 
Mike Lutz	Rochester Institute of Technology, Rochester NY
UUCP:		{allegra,seismo}!rochester!ritcv!mjl
CSNET:		mjl%rit@csnet-relay.ARPA

jmc@ist.UUCP (John Collins) (12/11/84)

One point which hasn't been made about the issue of stack frame sizes,
at least not that I've seen, is that the stack never gets deallocated in
UNIX, and if you call a function right at the start of your program which
uses lots of stack space it stays allocated.

The same is not true if you sbrk(2) the space you require, and tidy up
afterwards, although I admit this is less convenient (I like the idea of
the computer doing all the donkey work), but it does give you control if
you need it, as you often do on small machines.
-- 
John Collins calling courtesy of ist
	Please reply to ...!mcvax!ist!inset!jmc
Phone: +44 727 57267
Snail: 47 Cedarwood Drive, St Albans, Herts, AL4 0DN, England

henry@utzoo.UUCP (Henry Spencer) (12/11/84)

Using one memory stack for both register-stack overflow and big-data stack
is not inconceivable, although it would have to be thought out carefully.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

mjl@ritcv.UUCP (Mike Lutz) (12/15/84)

>One point which hasn't been made about the issue of stack frame sizes,
>... [is that] if you call a function right at the start of your program which
>uses lots of stack space it stays allocated.

Actually, this was use to advantage in the original Pascal compiler
system from Amsterdam.  One of the C programs in the set (I forget
which one) used *gobs* of data space, so to make it run on a PDP-11 the
I/O buffers were declared as local variables in the main function, and
thus were on the stack.  Given the constraints imposed by the PDP-11
MMU, this actually resulted in a net gain in useful data memory as
compared to allocating buffers dynamically using sbrk.
-- 
Mike Lutz	Rochester Institute of Technology, Rochester NY
UUCP:		{allegra,seismo}!rochester!ritcv!mjl
CSNET:		mjl%rit@csnet-relay.ARPA

jmc@ist.UUCP (John Collins) (12/21/84)

>The I/O buffers were declared as local variables in the main function....
>thus were on the stack.  Given the constraints imposed by the PDP-11
>MMU, this actually resulted in a net gain in useful data memory as
>compared to allocating buffers dynamically using sbrk.
>Mike Lutz

Yes - but that isn't the same as what I said. Fine if it is the main routine
but if you DON'T want a big stack, but early on you call (from main) a
routine which does, you eat up memory forever.
-- 
John Collins calling courtesy of ist
	Please reply to ...!mcvax!ist!inset!jmc
Phone: +44 727 57267
Snail: 47 Cedarwood Drive, St Albans, Herts, AL4 0DN, England