ressler@cs.cornell.edu (Gene Ressler) (03/21/91)
We are developing code generators for the Sun SPARCstations and need to compile a bibliography of applicable (i.e. SPARC and SPARC optimization-specific) references. If you'll e-mail your suggestions, I'll post a summary. I've seen the discussion referring to chip manufacturers' literature. Is there anything better? Anything Sun-specific? _Any_ suggestion would be helpful, as even our assembler references are mighty thin. Gene Ressler -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
salomon@ccu.umanitoba.ca (Dan Salomon) (03/25/91)
Check out the book "Microprocessors: A Programmer's View" by Dewar & Smosna. They compare the SPARC and the MIPS and give clues to optimizations that didn't occur to me by reading the SPARC architecture manual. The book is very informative, even though the writing is flat footed. -- Dan Salomon -- salomon@ccu.UManitoba.CA Dept. of Computer Science / University of Manitoba Winnipeg, Manitoba, Canada R3T 2N2 / (204) 275-6682 -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
jpff@maths.bath.ac.uk (John ffitch) (03/28/91)
There was reference to a book by Dewar and Smosna which compared SPARC and MIPS. Does anyone know the publisher? I have been unable to trace it in our library. So far my experience of SPARC has been less than good, as the explanation in the Architecture manual leaves a lot to the imagination. Some thing which really explained what happens to the stack or window roll, how va_args is supposed to work, and so on would be really helpful. In my code someone keeps trampling on the stack, and honest guv' it's not me! ==John -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
salomon@ccu.umanitoba.ca (Dan Salomon) (03/30/91)
In article <1991Mar28.115715.3545@maths.bath.ac.uk> John ffitch <jpff@maths.bath.ac.uk> writes: >There was reference to a book by Dewar and Smosna which compared SPARC and >MIPS. Does anyone know the publisher? I have been unable to trace it in >our library. I was the original poster of the information. Unfortunately my copy was lent out at the time so all I could provide were the authors last names and the title. Here is the full reference: Microprocessors: A Programmers View Robert B.K. Dewar & Matthew Smosna McGraw Hill (c) 1990 ISBN 0-07-016639-0 >So far my experience of SPARC has been less than good, as >the explanation in the Architecture manual leaves a lot to the >imagination. Some thing which really explained what happens to the stack >or window roll, how va_args is supposed to work, and so on would be really >helpful. You won't find that kind of detail in Dewar and Smosna. Check out the manual "Porting Software to SPARC Sytems" that comes bound with the "Assembler Language Reference Manual". It has a little information on this topic in the section "Porting C Programs" and the subsection on varargs(). >In my code someone keeps trampling on the stack, and honest guv' >it's not me! This is indeed a "gotcha" of programming on the SPARC. If you read the specifications of the calling conventions carefully (in the appendix called "Software Considerations" of the architecture manual) you will find that the top of the stack must at all times contain 23 words that can be clobbered by called procedures. So any space that you use on the stack must have been allocated below the stack top by the SAVE instruction. The difficulty that I have had is in finding specific details on the calling conventions that the C compiler uses. I believe, for instance, that it will say that the caller, not the callee, is responsible for saving the floating-point registers. -- Dan Salomon -- salomon@ccu.UManitoba.CA Dept. of Computer Science / University of Manitoba Winnipeg, Manitoba, Canada R3T 2N2 / (204) 275-6682 -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
chased@Eng.Sun.COM (David Chase) (04/02/91)
> The difficulty that I have had is in finding specific details on the > calling conventions that the C compiler uses. I believe, for instance, > that it will say that the caller, not the callee, is responsible for > saving the floating-point registers. Get your hands on a copy of the Applications Binary Interface. There are two versions pertinent to your problems: the Generic ABI, and the SPARC Processor Supplement. These are published by Prentice-Hall, and describe the "AT&T System V Applications Binary Interface". And yes, the caller is responsible for saving the floating-point registers. David Chase Sun -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
pardo@cs.washington.edu (David Keppel) (04/02/91)
>John ffitch <jpff@maths.bath.ac.uk> writes: >>[Somebody's trampling the stack, and honest, it isn't me!] Back when I was trying to port a threads package to the SPARC I had an opportunity to learn all about the stack layout conventions. I wrote up my experience, which is mostly concerned with the interaction between register windows and stack layout. Nonetheless, it might prove instructive for anybody who's interested in the stack layout. For the next two weeks or so, you can get a copy of my ``what I learned'' writeup via anonymous ftp from `cs.washington.edu' (128.95.1.4) in `pub/pardo/README.SPARC'. ;-D on ( SPARC dereferences, too! ) Pardo -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
torek@elf.ee.lbl.gov (Chris Torek) (04/12/91)
(I will try to stick to `language' issues here.) >In article <1991Mar28.115715.3545@maths.bath.ac.uk> John ffitch ><jpff@maths.bath.ac.uk> writes: [about SPARC machines] >>Some thing which really explained what happens to the stack or window >>roll, how va_args is supposed to work, and so on would be really helpful. In article <1991Mar29.214751.3045@ccu.umanitoba.ca> salomon@ccu.umanitoba.ca (Dan Salomon) writes: >... the top of the stack must at all times contain 23 words that can be >clobbered by called procedures. So any space that you use on the stack >must have been allocated below the stack top by the SAVE instruction. No and yes: you must typically reserve at least 96 bytes, but the reason given above is incomplete. The SPARCstation has `register windows': there are some number of registers arranged in a circular fashion, with overlap. A five bit field in the CPU Processor Status Register (PSR), called the Current Window Pointer (CWP), tells which window is `current'. References to Input, Local, and Output registers are really references to registers in the current window. The five bit field guarantees that no more than 32 windows will ever exist. Actual SPARC implementations have fewer windows (e.g., this SparcStation-1 has 7). Call the actual number `nwindows'. Two special unprivileged instructions allow you to alter the CWP field: -- SAVE: this decrements CWP. If the result is 31, it is changed to nwindows - 1. In other words, this computes psr<4:0> <- (psr<4:0> - 1) mod nwindows; -- RESTORE: this increments CWP. If the result is nwindows, it is changed to 0. In other words, this computes: psr<4:0> <- (psr<4:0> + 1) mod nwindows; Another privileged register, the Window Invalid Mask (WIM), holds a bitmask of `invalid' windows. This is used, e.g., to keep subroutines from `stepping on' the contents of some other subroutine's window. SAVE and RESTORE trap to the operating system, rather than doing their usual job, if the bit corresponding to the new CWP field is set in the WIM. That is, SAVE and RESTORE really do this: new_cwp <- (psr<4:0> OP 1) mod nwindows; // OP => + or - if ((1 << new_cwp) & WIM) then trap; else psr<4:0> <- new_cwp; Every trap begins by doing an implicit SAVE (even if the result makes CWP indicate an invalid window) and writing some trap recovery information into the Local registers in the new window, thus the operating system must always maintain at least one invalid window (for traps). (Trap handlers must either run entirely within their special window, or else go through some fairly major gyrations, which makes writing the trap code very interesting, but this is mainly an architectural issue....) For simplicity, let us assume that the machine has 7 windows, leaving at most 6 to user programs. Suppose that a user program is started with CWP=6 and window 0 marked invalid. This means the user program can use windows 6, 5, 4, 3, 2, and 1 without causing a trap. Let us also assume that nothing else uses any windows (e.g., all interrupts are disabled), and that each subroutine uses one new window. Then we might have a situation like this: _startup window = 6 main() window = 5 init() window = 4 initobj() window = 3 initobjtab() window = 2 emalloc() window = 1 Now emalloc() calls malloc(), which attempts a SAVE instruction. Window 0 is invalid and this therefore traps. What must the trap handler do? Somehow, the trap handler must make window 0 `available'. For window 0 to be available, window 6 must not be in use (it must be available for traps to scribble into)---but window 6 contains values that the C library startup code may need. These must be saved somewhere. SunOS, Sprite, and 4BSD all use the same technique: they write the contents of window 6 into the place to which window 6's stack pointer points. Clearly window 6's registers must be saved into some location unique to this invocation of _startup. One technique, used (I believe) on the Pyramid, is to have a separate `control stack'. The advantage here is that if the control stack pointer is not user-modifiable, the O/S can be sure that it points to a valid place. Existing SPARC window save code goes through the above-noted gyrations in order to verify the user stack pointer. Things are particularly exciting when the user stack happens to have been paged out. With a control stack, the O/S can guarantee a minimal in-core region whenever the user process is runnable. Depending on time pressures and compatibility bugaboos, we may investigate using a control stack instead of, or in addition to, the user's stack, in 4BSD, assuming I ever get 4BSD going (maybe if I stopped working on this news article... :-) ). Control stacks have the disadvantage that one must partition the virtual space in advance. If the partitioning is a mismatch for the process, this may put an artificially low limit on the number of stack frames. One can move move the control stack in virtual space, but this quickly becomes complex. In any case, current systems want to do the following within the trap handler: (change to window 6) std %l0, [%sp + (0*8)] ! store Local registers into stack std %l2, [%sp + (1*8)] std %l4, [%sp + (2*8)] std %l6, [%sp + (3*8)] std %i0, [%sp + (4*8)] ! store Input registers into stack std %i2, [%sp + (5*8)] std %i4, [%sp + (6*8)] std %i6, [%sp + (7*8)] (change back to window 0, set WIM to 1 << 6, return from trap) This whole sequence imposes one constraint, and the `std' instructions impose another: A. There must be at least 64 bytes at each window's %sp that are otherwise unused. B. Each window's %sp must be doubleword (8 byte) aligned. If these conditions are not met, SunOS and Sprite kill the process. (My kernel uses a special per-process save area to hold the values until C code can store them into the user stack, and I do not bother to check for 8-byte alignment in this code, so in theory your program will continue to run, albeit slowly, if you goof up the alignment.) The obvious inverse sequence occurs when RESTORE instructions trap (the trap handler is somewhat peculiar since the implicit SAVE on each trap moves the CWP in the wrong direction). This takes care of the first 64 bytes, or 16 words, that Dan Salomon mentioned. What about the other 7 words? Sun defined their stack frame format to include another 8 words on every stack frame, partitioned as follows (see <machine/frame.h>): 1*4 bytes: fr_stret, `struct return addr' 6*4 bytes: fr_argd[6], `arg dump area' 1*4 bytes: fd_argx[1], `array of args past the sixth' The `struct return addr' field is normally unused. For C functions that return a structure object, however, Sun's compiler does the following. Suppose function f() returns a structure. Then: 1. Routines that call f() set their own fr_stret frame element to point to the place in which f() should store its return value. 2. Routines that call f() do so with the sequence: call _f nop ! or pass an argument unimp SIZE Here SIZE is the number of bytes the caller expects f() to store through fr_stret; this is stored in an otherwise unused bit field within the UNIMP instruction. 3. f() returns not with the usual `ret; restore' sequence, but rather through a jump to .stret1, .stret2, .stret4, or .stret8. These library routines are given the address of f()'s return value (which f() has built somewhere in its own stack frame) and the size of this value. They then: A. Check the instruction at the return location. If it is not an UNIMP, they just return (thus discarding f()'s return value). Otherwise: B. Read the `size' field out of the UNIMP instruction. If this matches the number of bytes f() wants to return, they copy that many bytes from f()'s return structure to wherever fr_stret points, and then advance the return address over the UNIMP instruction. If the sizes do not match, they leave the return address alone. They then return from f(); if the sizes did not match, this causes a runtime error (a core dump) because of the UNIMP instruction. The only difference between the four `.stret' routines is the loop used to copy the return value: .stret1 copies bytes, .stret2 copies halfwords, .stret4 and .stret8 copy words (.stret8 could copy doublewords but on SparcStation-1s this is no faster). (I have to wonder why Sun do not simply have f() do the work inline and call bcopy() or memmove().) There are a number of reasons why this is the wrong approach, but I will not go into them here. GCC uses the correct technique: callers of f() pass an extra `hidden' argument which points to the place f() should write its return value, and small structures are returned entirely within registers, without any copying. This does not catch runtime errors but is considerably more efficient (oops, I said I was not going to go into this :-) ). This leaves the arg dump area and arg extension space. fr_argx is simply an array of `arguments that did not fit into the 6 Input registers'; its size actually varies, and is usually 0 (which is then rounded up to 1 because of the stack alignment constraints). The arg dump area has two essentially separate uses: A. Functions that run out of registers may spill some of them into the arg dump area. Sun's compiler only ever spills input registers 0 through 5 into the corresponding space (probably because of the next use). Functions whose arguments' addresses are taken can likewise use the arg dump area to store these. B. Functions that take variable arguments can write their input registers here. Since the arg dump area immediately precedes the extension space, this puts all parameters into a single contiguous region on the stack. This means that old (broken) code that assumes contiguous addressible arguments continues to work. Both of these can be (and, I claim, should be) done differently. Functions that must spill registers, or take addresses of parameters, can allocate their own stack space. Functions with variable argument lists can write the register arguments into local stack space and retrieve arguments using something like: next_arg = --regarg >= 0 ? *regblock++ : *argx++; (note that large objects such as arguments of type `double' must be handled differently). This slows down varargs functions, but they are rare. At the least, the `fr_stret' field should not exist; this would allow input registers to be stored with `std' instructions, which *are* faster on some implementations It would also allow the argument extension area to occpy no space in the usual case. Anyway, this is where the 96 bytes per stack frame disappears to. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.