whm@arizona.UUCP (Bill Mitchell) (11/30/84)
Has anyone ever empirically studied the sizes of C stack frames for UNIX applications? The particular question I have is: How serious would a 65kbyte maximum frame size be for a 32 bit machine. I don't think I've ever seen a function that would come close to this limit, but I really don't have a good feel for how reachable this limit would be in the majority of cases. Bill Mitchell whm.arizona@csnet-relay {noao,mcnc,utah-cs}!arizona!whm
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/01/84)
> Has anyone ever empirically studied the sizes of C stack frames for UNIX > applications? The particular question I have is: How serious would a > 65kbyte maximum frame size be for a 32 bit machine. I have recently had some practical experience with this issue. On the Gould UTX-32 system (and I believe on IBM 370s too), the architecture rather strongly encouraged the C/UNIX implementors to use a relatively small chunk of address space for the run-time stack. The exact amount can be adjusted when one link-edits a binary executable image; using around 24Kb worked well for nearly all UNIX System V utilities. 64Kb would probably suffice for almost all applications; those having trouble most likely could be fixed by changing large "auto" arrays to "static". I have yet to see a practical example where a huge amount of recursion was required; however, if one turned up, it would not be easy to fix. So a SINGLE stack frame limit of 64Kb would certainly not be very troublesome. I imagine someone has written code that needs more, but they have already severely limited their target machine choices.
grunwald@uiucdcsb.UUCP (12/02/84)
/* Written 9:27 pm Nov 30, 1984 by malcolm@ecn-ee in uiucdcsb:net.lang.c */ ... I wonder what this type of programming style would do to a Berkeley style RISC machine? Malcolm /* End of text from uiucdcsb:net.lang.c */ On the pryamid, which is essentially a Berkeley style RISC machine, the array would be stored in real memory, not registers. Same goes for structs and (I think) unions. They only keep the "normal stuff" in registers. So, if you defined: int *i, j[1000], *k; both "i" and "k" would be in registers and j would be in memory.
malcolm@ecn-ee.UUCP (12/02/84)
I commonly put up to a megabyte into a single stack frame. I write heavy duty number crunching for a living and can't live without C's modularity (compared to Fortrash.) I find it is much more elegant to put all of my temporary arrays on the stack so that they are hidden from other routines and they don't take up memory space except when the routine is active. This is especially a win for rarely called routines. Of course I could use malloc but this would be a real pain in the a**. I wonder what this type of programming style would do to a Berkeley style RISC machine? Malcolm
henry@utzoo.UUCP (Henry Spencer) (12/03/84)
> I commonly put up to a megabyte into a single stack frame. ... > > I wonder what this type of programming style would do to a Berkeley style > RISC machine? Not much. RISC compilers are supposed to display some intelligence about data structures that are obviously too big to fit in the register bank. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
henry@utzoo.UUCP (Henry Spencer) (12/03/84)
I'm told that the total stack use by practically all V7 programs is tiny, a few K at most. The major exception is the Ritchie cc, which uses a recursive-descent parser. Many things have grown some since V7, of course... -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
chris@umcp-cs.UUCP (Chris Torek) (12/04/84)
[Before I begin, I have to add a disclaimer that the information given here was obtained almost entirely by examining the output of the C compiler on one particular Pyramid 90-X. Nothing stated herein should be taken as official. End disclaimer.] The Pyramid is a wonderful machine for exercising ``portable'' C code! Not only are unions and structs kept only on the stack, the evaluation order for subroutine calls is different. First, here's a little bit of information about the hardware architecture. There are always 64 registers avaliable at any point in code. They are divided into four groups of sixteen registers. Four registers in each group are used for special things, so you really wind up with 12 registers in each group. ----------------------------------------------------------------- proc A ------- | pr0 | ------- | ... | ------- | pr15| ------- | lr0 | ------- | ... | ------- | lr15| proc B ------- ------- | tr0 | | pr0 | ------- ------- | ... | | ... | ------- ------- | tr15| | pr15| ------- ------- | ... | | ... | ------- | tr0 | ------- | ... | ------- | tr15| ------- Register mapping: proc A calls proc B Figure 1. ----------------------------------------------------------------- The registers are named by a group description and a number. The four groups are "global", "parameter", "local", and "transfer". The 16 (really 12) global registers, gr0 through gr15 (gr11), are the same in every procedure; the others are windowed, with some hardware and software taking care of overflows when calling and returning. Transfer registers are used to pass parameters (where they become parameter registers); parameter registers are used for local parameters and return values; and local registers are used as in conventional architectures. See Figure 1 for a pictorial representation of parameter, local, and transfer registers. As shown in Figure 1, a called procedure's registers (the parameter registers for proc B) and the calling procedure's registers (the transfer registers for proc A) overlap, allowing the "transfer" of "parameters" through registers (clever, no?). Now, of course, only fixed-size units can be passed via registers; when arrays or other "large" parameters are passed, conventional mechanisms are used. (There are no special cases for "small" structures in the present implementation of the C compiler.) Now, this presents another problem: what if more than 12 parameters are passed? Again, the Pyramid compiler uses conventional mechanisms -- but only when the first 12 registers are used up. This means that if you compile the code f() { g(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15); } the last four arguments (12 through 15) are passed on the stack, with the first 12 (0 through 11) being in the corresponding transfer/parameter register. Since the transfer registers are also used as temporary registers when evaluating expressions, the compiler naturally evaluates parameters which will be placed on the stack before those that will go into registers. As with the Vax compilers, these are evaluated in reverse order (right to left), so that each can be pushed onto the stack as soon as the actual value is known. However, the compiler evaluates the first 12 parameters left to right! In other words, the Pyramid compiler evaluates arguments from the outside in. ---------- By the way, this reminds me of a bug we discovered: if you have a C function that returns a value which is an expression involving the first parameter, the compiler accidently clobbers the parameter before completing the evaluation. That is, since pr0 (the caller's tr0) is used hold the return value, someone decided to make it available for temporary storage, resulting in code like get pr0 mask with 0xff shift left 8 bits store in pr0 # oops! get pr0 mask with 0xff00 shift right 8 bits or into pr0 return for return ((x & 0xff) << 8) | ((x & 0xff00) >> 8); This is not good, to say the least. ---------- [In case you've read this far: we (Maryland) have an info-pyramid mailing list to discuss anything to do with Pyramid machines; if you'd like to be added to the list, send mail to info-pyramid-request@MARYLAND.ARPA, info-pyramid-request@umcp-cs.CSNet, or seismo!umcp-cs!info-pyramid-request.] -- (This line accidently left nonblank.) In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690 UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
mjl@ritcv.UUCP (Mike Lutz) (12/04/84)
>From: ecn-ee!malcolm Nov 30 21:27:00 1984 >I commonly put up to a megabyte into a single stack frame...I find >it is much more elegant to put all of my temporary arrays on >the stack...I wonder what this type of programming style would do to a >Berkeley style RISC machine? Probably wouldn't have much effect, as the philosophy of RISC is to hold scalars in the register bank. However, it does point up an interesting problem -- does the Berkeley RISC require a parallel stack, maintained by software protocols, to hold structures, arrays, and other humongous local variables? -- Mike Lutz Rochester Institute of Technology, Rochester NY UUCP: {allegra,seismo}!rochester!ritcv!mjl ARPA: ritcv!mjl@Rochester.ARPA
henry@utzoo.UUCP (Henry Spencer) (12/06/84)
> ... does the Berkeley RISC require a parallel stack, > maintained by software protocols, to hold structures, arrays, and other > humongous local variables? The RISC needs a memory stack anyway, because of the possibility that the "register stack" will overflow. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
mjl@ritcv.UUCP (Mike Lutz) (12/10/84)
>> ... does the Berkeley RISC require a parallel stack, >> maintained by software protocols, to hold structures, arrays, and other >> humongous local variables? > >The RISC needs a memory stack anyway, because of the possibility that the >"register stack" will overflow. True, but the memory overflow stack is basically a 'shadow' of the register stack, or, alternatively, one can view the register stack as a cache for the most recently activated procedures. The original RISC-I papers from Berkeley imply this when discussing the software handlers for register stack overflow/underflow. Note that the plan to have the registers addressable as memory locations agrees with the shadow/cache model. However, since the idea the register stack is designed for scalars and pointers, I still think there is a need for a separate stack for large, automatic, data structures - no? -- Mike Lutz Rochester Institute of Technology, Rochester NY UUCP: {allegra,seismo}!rochester!ritcv!mjl CSNET: mjl%rit@csnet-relay.ARPA
jmc@ist.UUCP (John Collins) (12/11/84)
One point which hasn't been made about the issue of stack frame sizes, at least not that I've seen, is that the stack never gets deallocated in UNIX, and if you call a function right at the start of your program which uses lots of stack space it stays allocated. The same is not true if you sbrk(2) the space you require, and tidy up afterwards, although I admit this is less convenient (I like the idea of the computer doing all the donkey work), but it does give you control if you need it, as you often do on small machines. -- John Collins calling courtesy of ist Please reply to ...!mcvax!ist!inset!jmc Phone: +44 727 57267 Snail: 47 Cedarwood Drive, St Albans, Herts, AL4 0DN, England
henry@utzoo.UUCP (Henry Spencer) (12/11/84)
Using one memory stack for both register-stack overflow and big-data stack is not inconceivable, although it would have to be thought out carefully. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
mjl@ritcv.UUCP (Mike Lutz) (12/15/84)
>One point which hasn't been made about the issue of stack frame sizes, >... [is that] if you call a function right at the start of your program which >uses lots of stack space it stays allocated. Actually, this was use to advantage in the original Pascal compiler system from Amsterdam. One of the C programs in the set (I forget which one) used *gobs* of data space, so to make it run on a PDP-11 the I/O buffers were declared as local variables in the main function, and thus were on the stack. Given the constraints imposed by the PDP-11 MMU, this actually resulted in a net gain in useful data memory as compared to allocating buffers dynamically using sbrk. -- Mike Lutz Rochester Institute of Technology, Rochester NY UUCP: {allegra,seismo}!rochester!ritcv!mjl CSNET: mjl%rit@csnet-relay.ARPA
jmc@ist.UUCP (John Collins) (12/21/84)
>The I/O buffers were declared as local variables in the main function.... >thus were on the stack. Given the constraints imposed by the PDP-11 >MMU, this actually resulted in a net gain in useful data memory as >compared to allocating buffers dynamically using sbrk. >Mike Lutz Yes - but that isn't the same as what I said. Fine if it is the main routine but if you DON'T want a big stack, but early on you call (from main) a routine which does, you eat up memory forever. -- John Collins calling courtesy of ist Please reply to ...!mcvax!ist!inset!jmc Phone: +44 727 57267 Snail: 47 Cedarwood Drive, St Albans, Herts, AL4 0DN, England