grunwald@m.cs.uiuc.edu (10/28/88)
This question is follows the line of recent questions concerning register save/restore conventions. Many (most UNIX ) systems apply the convention that the callee must save any registers used in a procedure. Other systems dictate that the caller must save the registers. Question One: Is there an advantage? I can think of many practical advantages to the former method (callee saves) vs. caller saves, but I can also think of advantages to the latter. Now, the second question concerns a saving convention. I'd like to know if this has been implemented/modelled anywhere, and what the advantages are. Presume that we have a mask of dirty bits for the register file. Presume that each procedure specifies a bit mask of registers that are used (as is already done on the 680x0 and NS32x32). The only registers that need to be saved are those denoted by the AND-ed product of the two bit masks. The initial dirty mask of the called procedure would contain those bits that didn't get AND-ed (i.e. those registers that aren't used by the current procedure but still contain live data). When registers get over-written, their dirty bit is set. This has the advantage of saving only those registers that actually need to be saved. The cost is similar to current register load/unload masks, for common architectures. There's a cost/perf. tradeoff as register sets get larger. For 64 registers, you'd need a double-longword of bits for the masks (although you'd probably split it into two 32 bit sets, since very few procedures would use more than 32 registers, and those procedures could just execute two instructions). It would require some slight changes to compilers. You'd like to randomize register accesses across subroutines, or perhaps record information about register acessses in other subroutines so you don't select the same registers they used. So, has this been done before? Modelled? Any data? Dirk Grunwald Univ. of Illinois grunwald@m.cs.uiuc.edu
henry@utzoo.uucp (Henry Spencer) (10/30/88)
In article <3300037@m.cs.uiuc.edu> grunwald@m.cs.uiuc.edu writes: >Many (most UNIX ) systems apply the convention that the callee must save >any registers used in a procedure. Other systems dictate that the caller >must save the registers. ... Is there an advantage? As usual, it depends. Ideally, one wants to save and restore registers as little as possible, because it costs time and memory. The caller knows which registers don't have to be saved because they don't contain anything interesting. The callee knows which registers don't have to be saved because he's going to leave them alone anyway. Neither is clearly superior for all situations. It's not unheard-of to split the available registers into a callee-saved group and a caller-saved group. (What does MIPS do?) The callee-saves bias in Unix is basically historical. On the 11, there were so few free registers that the calling sequence simply saved and restored all of them, and doing this in the callee saved code space. (This is a slight oversimplification.) On the VAX, the wonderful all- singing all-dancing standard calling sequence provided by the hardware encouraged callee-saves. Not everyone has bothered to rethink the issues when changing processors. -- The dream *IS* alive... | Henry Spencer at U of Toronto Zoology but not at NASA. |uunet!attcan!utzoo!henry henry@zoo.toronto.edu
yuval@taux02.UUCP (Gideon Yuval) (10/31/88)
In article <1988Oct30.013510.16861@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >...................................... On the VAX, the wonderful all- >singing all-dancing standard calling sequence provided by the hardware >encouraged callee-saves. Not everyone has bothered to rethink the issues >when changing processors. The VAX all-singing CALLS/CALLG was also VERY slow; but the only way to find this out was to time it by (e.g.) the Unix "time" command -- DEC never published any Vax timing-notes (that I know of). -- Gideon Yuval, yuval@taux01.nsc.com, +972-2-690992 (home) ,-52-522255(work) Paper-mail: National Semiconductor, 6 Maskit St., Herzliyah, Israel TWX: 33691, fax: +972-52-558322
dietz@cs.rochester.edu (Paul Dietz) (10/31/88)
"Callee-saves" and "caller-saves" are just two instances of a more general register saving strategy. Suppose we know the call graph of the program, and we know which registers each procedure uses. Register save/restore code can be thought of as occuring on the edges of this call graph. If we know the execution frequency of each arc in the graph, and if the time to save/restore a register is independent of the number of register being saved (a big if), the optimal code for saving/ restoring each register can be found using an algorithm for maximum network flow (finding the cut of minimum total frequency that disconnects all procedures using the register). Paul F. Dietz dietz@cs.rochester.edu
bcase@cup.portal.com (Brian bcase Case) (11/01/88)
>The VAX all-singing CALLS/CALLG was also VERY slow; but the only way to >find this out was to time it by (e.g.) the Unix "time" command -- DEC never >published any Vax timing-notes (that I know of). Yes, this never ceased to amaze me. How in the world would you write a good compiler for this machine?; it seems impossible to make good code selection decisions without knowing how long things take. I guess they thought the only consideration was code size; at leas the archtitecture manual *had* to tell you the instruction encodings! :-) Oh, yeah, I forgot: smaller code is faster code. Sigh.
lindsay@k.gp.cs.cmu.edu (Donald Lindsay) (11/02/88)
In article <228@taux02.UUCP> yuval@taux02.UUCP (Gideon Yuval) writes: >The VAX all-singing CALLS/CALLG was also VERY slow; On the 11/780, it was slow because the write-to-memory FIFO wasn't very deep. The typical CALLS, with a typical register save mask, wrote more words than the FIFO could absorb. So, the CPU stalled, waiting for the memory to make more room in the FIFO. Of course, there are engineering reasons for avoiding deep FIFO's. Since this single design decision caused a bottleneck, I assume that it was an oversight. The Nautilius (8700, etc) was carefully tuned against real instruction traces. I believe that CALLS runs somewhat better on these machines. -- Don lindsay@k.gp.cs.cmu.edu CMU Computer Science
firth@sei.cmu.edu (Robert Firth) (11/03/88)
Register Saving across Procedure Calls -------------------------------------- Which is better - caller saves or callee saves? A. Is this the right question? First, and most important, if you are designing a professional-quality production compiler, this is the wrong question. Such a compiler must perform interprocedural optimisation if it is to be respectably state of the art. However, if you want to design a prototype, amateur, or deliberately low-cost compiler, the issue is probably one worth considering. To keep this note short, I'm going to assume you understand the basic issue and are familiar with current hardware and software technology. B. Are the strategies equally sound? The point I consider most important, is that there is a definite semantic asymmetry between the two strategies. If the caller saves, then the caller is saving, locally, his own local state. This seems to me basically correct. If the callee saves, then the callee is saving, local to him, state that belongs to someone else. Moreover, he is saving state of greater extent - the caller's registers - in space of lesser extent - his own stack frame. This seems to me semantically unsound. Now, I tend to let few things get in the way of efficiency, especially efficiency of something as crucial as the procedure call, but semantic correctness is one of those things. So in this case, I'm going to come out and say that "callee saves" is fundamentally wrong, and should be avoided if possible, even at some cost. C. Which is more efficient? Happily, however, the efficiency arguments, in my experience, support the "caller saves" strategy, so one can indeed do well by doing good. The most blatant case is that of the longjump, which appears in other languages as a GOTO or RAISE statement. This causes a jump out of a procedure to somewhere further up the call chain, and so must reset the environment of the destination. If the caller saves state, then this is simple: the jump is a jump, and the destination knows where all the state has been saved. In most implementations, one need only reset the frame pointer to the current incarnation of the destination procedure, and take the jump to the label. But if the callee saves, then the caller has no idea how to recover his saved state, which may be buried any number of stack frames further down. It is therefore necessary to unwind the entire stack before taking the jump. The difference in cost can easily be a factor of 100 or more. I do not regard this as a marginal point. The exception is beginning to be used as a normal programming tool; it is a feature of several modern languages and will probably be a feature of most new ones. Its efficient implementation is as desirable as the efficient implementation of, say, for-loops or array assignments. Turning though to the main topic - which is the faster strategy for a normal call and return - I see two issues here: the number of registers to be saved and restored, and the cost of each save and restore. D. Some facts and guesses In my experience, there is almost no difference between the number of registers used by the caller and the number used by the callee. Small procedures tend to use fewer than less small, and leaf procedures tend to be a bit smaller, so on balance it seems marginally better for the callee to save. (What this also tells us is that interprocedural optimisation of leaves and leaf-callers only will give you big returns) But this is outweighed by two factors * The callee must save all registers it will use throughout the body; the caller need save only the registers that are live at the point of call. * When two or more calls occur in succession, both callees must save, but the caller need save only once. Rough guesses I have accumulated over time are * at any call point, caller is using ~ 2/3 of the registers it will use at all (though this is partly due to defects in register allocation strategies) * on average, a procedure call is (almost) immediately followed by another about 2/3 of the time. This implies that if the caller saves, it will have to save ~ 45 times for every 100 calls. These two factors together imply that the cost of a caller-saves protocol is about 1/3 that of a callee-saves protocol. (Do you believe that?) Now consider the cost of a save and restore. There are two factors that make it cheaper for the caller to save * the register may be slaving a known value. The caller then need not save at all, merely restore. I find this is true of at least 20% of live registers. (Consider for instance the MC68020. If you are working hard, you probably have 5 or 6 live D registers, of which at least one is holding a constant, and 4 or 5 live A registers, of which perhaps two are holding pointers to data structures. That's 3 out of 10) * the store may be combined with another operation. For example, the last operation on the register may have been an add ADDL2 X,R1 This can be changed into ADDL3 X,R1,save_place A small saving, admittedly, but a saving. There is a third factor I have not assessed satisfactorily, which applies especially to RISC machines. Is the caller or the callee better able to distribute loads and stores through the code, so as to overlap any load or store delays? I suspect it is the callee, but that is a hunch. E. What about Hardware Help? On the question of a "high-level procedure call" implemented in hardware, such as the VAX CALLS, I think my opinion is known: such instructions are worthless. But what about a simpler instruction, such as a hardware register save, or save-under-mask? A good example is provided by the MOVEM instruction of the MC68020, or the LDM/STM of the PE3200. These are typical, and typically their break-even point is at about 4 registers. This suggests to me that they are or marginal value - after all, if you are already passing parameters in registers, how many are left to save around the typical call? The instructions are probably still worth having for things like a task context save (if only the PE3200 permitted them to be used that way!), but their contribution to a procedure protocol is slight. F. Possibly Offensive Remark I agree with Henry, that procedure calling protocols need to be thought afresh for each fresh machine. Unfortunately, very few compiler shops seem prepared to do this. I see over and again the model of a closed, downward-growing stack; caller pushes parameters; callee saves registers and allocates local space. One can do better than this by a factor of two or three, on almost any machine I know, with a different model more amenable to local optimisation. We are used to making up in software for badly designed hardware. Now we have the reverse: making up in hardware for badly designed software. At least, that is my explanation for register windows.
daryl@hpcllla.HP.COM (Daryl Odnert) (11/03/88)
Dirk Grunwald (grunwald@m.cs.uiuc.edu) asks: > Many (most UNIX ) systems apply the convention that the callee must save > any registers used in a procedure. Other systems dictate that the caller > must save the registers. Some systems use a mixed strategy. For example, both the HP Precision Architecture (HPPA) and the MIPS R2000 split up the register set into two partitions, a caller-saves set and a callee-saves set. The compiler is free to use any register in the caller-saves set without saving or restoring that register. These registers cannot be used to hold live values across procedure calls. If a register in the callee-saves (or entry-saves) set can only be used if the procedure saves the value on entry and restores it at exit. Of course, these registers can be used to hold live values across calls. > Question One: Is there an advantage? I can think of many practical advantages > to the former method (callee saves) vs. caller saves, but I can also think > of advantages to the latter. The mixed strategy seems to work well. The difficult question is determining the right number of registers to put in each of the two partitions. Some benchmarks favor larger caller-saves partitions, other could take advantage of a large callee-saves set. > Now, the second question concerns a saving convention. I'd like to know if > this has been implemented/modelled anywhere, and what the advantages are. "Minimizing Register Usage Penalty at Procedure Calls" by Fred C. Chow of MIPS Computer Systems. It is published in the "Proceedings of the SIGPLAN '88 Conference on Programming Language Design and Implementation" (pg 85-94.) "Register Windows vs. Register Allocation" by David W. Wall of DEC Western Research Lab. Same conference proceedings (pg 67-78). Daryl Odnert daryl%hpda@hplabs.hp.com Hewlett Packard Information Software Division
johnl@ima.ima.isc.com (John R. Levine) (11/03/88)
In article <7580@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes: >Which is better - caller saves or callee saves? > >[he argues that for a variety of reasons, caller saves is faster.] The argument I most often heard for the callee saving the registers is that it makes the object code smaller. Most subroutines are called from more than one place (otherwise, why make it a subroutine? I know, libraries, modularity, etc.) so that you would need N copies of caller-saves, one for each call, but you only need one copy of callee-saves. -- John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869 { bbn | think | decvax | harvard | yale }!ima!johnl, Levine@YALE.something Rome fell, Babylon fell, Scarsdale will have its turn. -G. B. Shaw
bsy@PLAY.MACH.CS.CMU.EDU (Bennet Yee) (11/03/88)
In article <7580@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes: } Register Saving across Procedure Calls } }Which is better - caller saves or callee saves? }A. Is this the right question? } }First, and most important, if you are designing a professional-quality }production compiler, this is the wrong question. Such a compiler must }perform interprocedural optimisation if it is to be respectably state }of the art. } ... You must also consider the problems of separate compilation and multiple language applications. If register saving differs from module to module, you'd better have language extensions that allow you to specify external routines that you call to use some standard procedure call mechanism, as well as ways to specify that function that you're writing may be called by some external module and that it must likewise use a standard convention. The alternative is to require smart linkers. This is probably a religious issue, much like network byte ordering versus swap-only-as-required. }B. Are the strategies equally sound? } }The point I consider most important, is that there is a definite }semantic asymmetry between the two strategies. If the caller saves, }then the caller is saving, locally, his own local state. This seems }to me basically correct. If the callee saves, then the callee is }saving, local to him, state that belongs to someone else. } ... }This seems to me }semantically unsound. } } ... I'm going to }come out and say that "callee saves" is fundamentally wrong, and should }be avoided if possible, even at some cost. } }C. Which is more efficient? } }Happily, however, the efficiency arguments, in my experience, support }the "caller saves" strategy, so one can indeed do well by doing good. } }The most blatant case is that of the longjump, which appears in other }languages as a GOTO or RAISE statement. This causes a jump out of a }procedure to somewhere further up the call chain, and so must reset }the environment of the destination. If the caller saves state, then }this is simple: the jump is a jump, and the destination knows where }all the state has been saved. In most implementations, one need only }reset the frame pointer to the current incarnation of the destination }procedure, and take the jump to the label. } }But if the callee saves, then the caller has no idea how to recover }his saved state, which may be buried any number of stack frames further }down. It is therefore necessary to unwind the entire stack before taking }the jump. The difference in cost can easily be a factor of 100 or more. It's interesting to examine the ACIS implementation of longjmp/setjmp for the IBM RTs. The standard procedure call convention is callee-save, and longjmp does NOT unwind the stack. Contrast this with the Vaxen BSD implementation of longjmp/setjmp, which DOES unwind the stack. Vaxen BSD, of course, uses callee-save too. What is the difference? Well, for the IBM RTs, your registers have the same values as when they returned from the setjmp. On Vaxen, your registers have the same values as they had when you called the next function from within the same function that called the setjmp. So depending on one or the other behaviour for your register variables is not safe. It's a minor but significant semantic difference. [Anybody know what POSIX decided for this?] Now, how to avoid unwinding the stack and still retain the same semantics? It's actually not hard -- given that, for those ``other'' languages at least, GOTO and RAISE are part of the language, the compiler can just always save the contents of register variables before calling other functions _only for those functions that contain GOTO or RAISE_, and restore the registers variables when the exception occurs. Thus, you can get the efficiency of callee-saving (a big win for those often-used, little leaf functions that use only a few scratch registers) and retain the semantics that you want. Of course, it's hard to argue a similar case for C, since setjmp/longjmp is NOT part of the language.... And for those super-duper-smart compilers that puts a variable into a register for the first half of a function and another variable into the same register for the second half, unwinding the stack to restore registers from stack frames isn't quite enough either! -bsy -- Internet: bsy@cs.cmu.edu Bitnet: bsy%cs.cmu.edu%smtp@interbit CSnet: bsy%cs.cmu.edu@relay.cs.net Uucp: ...!seismo!cs.cmu.edu!bsy USPS: Bennet Yee, CS Dept, CMU, Pittsburgh, PA 15213-3890 Voice: (412) 268-7571
jkjl@munnari.oz (John Lim) (11/03/88)
One issue about the caller/callee saves argument that hasnt been brought up is code size. Callee saves minimizes code size. For example in C : a() {} b() {} c() {} f() {a();b();c();} g() {b();a();c();} If caller saves, assuming that saves and restores take 1 instruction each, we get 12 save/restore instructions used for the above code. If callee saves, we get 6 save/restore instructions needed only. Not too important you might think, but i remember that M'soft used the pascal calling convention in Windows to save 5% (if i remember) of code, which is similar in principle to the caller/callee argument. Luckily, this isn't so much of an issue when you arent confined to 640K of mem... john lim
amos@taux02.UUCP (Amos Shapir) (11/03/88)
I haven't seen anyone mention the idea of mixing caller-saves and callee-saves methods: the caller hands to the callee a mask of live registers; the callee ANDs this with a mask of the registers it uses and saves the registers whose corresponding bits are set. The mask the callee hands to the routines it calls is the OR of these two masks. (I hope that's clear). -- Amos Shapir amos@nsc.com National Semiconductor (Israel) P.O.B. 3007, Herzlia 46104, Israel Tel. +972 52 522261 TWX: 33691, fax: +972-52-558322 34 48 E / 32 10 N (My other cpu is a NS32532)
cik@l.cc.purdue.edu (Herman Rubin) (11/04/88)
In article <7580@aw.sei.cmu.edu>, firth@sei.cmu.edu (Robert Firth) writes: > > Register Saving across Procedure Calls > -------------------------------------- > > Which is better - caller saves or callee saves? > > > A. Is this the right question? It is, if library subroutines are to be used. A compiler cannot change the procedure to be followed in this case. .................. | B. Are the strategies equally sound? | | The point I consider most important, is that there is a definite | semantic asymmetry between the two strategies. If the caller saves, | then the caller is saving, locally, his own local state. This seems | to me basically correct. If the callee saves, then the callee is | saving, local to him, state that belongs to someone else. Moreover, | he is saving state of greater extent - the caller's registers - in | space of lesser extent - his own stack frame. This seems to me | semantically unsound. | | Now, I tend to let few things get in the way of efficiency, especially | efficiency of something as crucial as the procedure call, but semantic | correctness is one of those things. So in this case, I'm going to | come out and say that "callee saves" is fundamentally wrong, and should | be avoided if possible, even at some cost. | | | C. Which is more efficient? | | Happily, however, the efficiency arguments, in my experience, support | the "caller saves" strategy, so one can indeed do well by doing good. | | The most blatant case is that of the longjump, which appears in other | languages as a GOTO or RAISE statement. This causes a jump out of a | procedure to somewhere further up the call chain, and so must reset | the environment of the destination. If the caller saves state, then | this is simple: the jump is a jump, and the destination knows where | all the state has been saved. In most implementations, one need only | reset the frame pointer to the current incarnation of the destination | procedure, and take the jump to the label. | | But if the callee saves, then the caller has no idea how to recover | his saved state, which may be buried any number of stack frames further | down. It is therefore necessary to unwind the entire stack before taking | the jump. The difference in cost can easily be a factor of 100 or more. The only problem is in a dropback over several calls; it would then be necessary to have the callee place the information about what was saved and where so the caller could find it quickly. Possibly this should be use instead of an automatic restore on return. > I do not regard this as a marginal point. The exception is beginning to > be used as a normal programming tool; it is a feature of several modern > languages and will probably be a feature of most new ones. Its efficient > implementation is as desirable as the efficient implementation of, say, > for-loops or array assignments. > > Turning though to the main topic - which is the faster strategy for a > normal call and return - I see two issues here: the number of registers > to be saved and restored, and the cost of each save and restore. > > > D. Some facts and guesses > > In my experience, there is almost no difference between the number > of registers used by the caller and the number used by the callee. > | Small procedures tend to use fewer than less small, and leaf procedures | tend to be a bit smaller, so on balance it seems marginally better for | the callee to save. (What this also tells us is that interprocedural | optimisation of leaves and leaf-callers only will give you big returns) | | But this is outweighed by two factors | | * The callee must save all registers it will use throughout the body; | the caller need save only the registers that are live at the point of | call. | | * When two or more calls occur in succession, both callees must save, | but the caller need save only once. | | Rough guesses I have accumulated over time are | > * at any call point, caller is using ~ 2/3 of the registers it will > use at all (though this is partly due to defects in register > allocation strategies) > > * on average, a procedure call is (almost) immediately followed by > another about 2/3 of the time. This implies that if the caller > saves, it will have to save ~ 45 times for every 100 calls. > > These two factors together imply that the cost of a caller-saves protocol > is about 1/3 that of a callee-saves protocol. (Do you believe that?) > No. My experience is quite different. The cases where I do this are because the callee does not save; if I have a conditional subroutine call (not that uncommon) it may be necessary to do a save before the call--the save is the second call. > Now consider the cost of a save and restore. There are two factors that > make it cheaper for the caller to save > > * the register may be slaving a known value. The caller then need not > save at all, merely restore. I find this is true of at least 20% of > live registers. (Consider for instance the MC68020. If you are working > hard, you probably have 5 or 6 live D registers, of which at least one > is holding a constant, and 4 or 5 live A registers, of which perhaps > two are holding pointers to data structures. That's 3 out of 10) > > * the store may be combined with another operation. For example, the > last operation on the register may have been an add > > ADDL2 X,R1 > > This can be changed into > > ADDL3 X,R1,save_place > > A small saving, admittedly, but a saving. > > There is a third factor I have not assessed satisfactorily, which applies > especially to RISC machines. Is the caller or the callee better able to > distribute loads and stores through the code, so as to overlap any load > or store delays? I suspect it is the callee, but that is a hunch. This assumes that there are only a few registers. Try this on a machine with many registers, such as the CYBER205 with 256 registers. And consider the problem on a machine with vector registers. In most cases, most of the vector registers will be in use; I believe they all do not have nearly enough. This is likely to be at least 4096 bytes; the number of words depends on the word length. The problem is that any convention may or may not be right for the given application. > > E. What about Hardware Help? > > On the question of a "high-level procedure call" implemented in hardware, > such as the VAX CALLS, I think my opinion is known: such instructions > are worthless. But what about a simpler instruction, such as a hardware > register save, or save-under-mask? The VAX has register save (and restore) under mask. .................... Probably the best help that hardware can give is to have a "dirty" bit or a mask on call as to what it may be necessary to save. The bit is cleared on saving and reset on restoring. If a mask is used, the subroutine would remove the bits corresponding to saved registers to recompute its mask. The problem with a mask is that if there are a large number of registers, the mask is long. It seems from the references above, that the correspondent is ignoring the saving of flaoting-point registers where they are separate. They fall into the same category. It is much less likely that floating point registers will be of the restore only type than pointers. > > F. Possibly Offensive Remark > > I agree with Henry, that procedure calling protocols need to be thought > afresh for each fresh machine. Unfortunately, very few compiler shops > seem prepared to do this. I see over and again the model of a closed, > downward-growing stack; caller pushes parameters; callee saves registers > and allocates local space. One can do better than this by a factor of > two or three, on almost any machine I know, with a different model more > amenable to local optimisation. ...................... It is necessary to consider the problem for each machine. And with planned exceptions, and especially if an interrupt (and we should also have programmed interrupts on conditions, instead of having to test at each occasion), I see no good alternative to having the callee save. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
pardo@june.cs.washington.edu (David Keppel) (11/04/88)
bcase@cup.portal.com (Brian bcase Case) writes: >[ Amazing: DEC never gave instruction timings ] >[ How are you supposed to write a good compiler? ] In both defend DEC and support of RISC designers, most CISCs have great variability in timing *for a given instruction*. Some relevent considerations: * Addressing modes * Current state of the pipeline * Cache hit/miss for instruction & memory operands * State of the cache (if it is busy handling coherency it will respond slower on a hit) Note, for example, that based on subtle variations on usage a single 80386 instruction may execute a factor of 3 faster/slower according to the manual. That doesn't count pipeline stalls and the '386 doesn't have any 3-memory-operand instructions. One reason compiler-writers like RISCs is that you can *use* the machine description meaningfully. ;-D on ( Anybody got a 7-operand instruction? ) Pardo -- pardo@cs.washington.edu {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
baum@Apple.COM (Allen J. Baum) (11/04/88)
[] >In article <7580@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes: . . >F. Possibly Offensive Remark > >I agree with Henry, that procedure calling protocols need to be thought >afresh for each fresh machine. Unfortunately, very few compiler shops >seem prepared to do this. I see over and again the model of a closed, >downward-growing stack; caller pushes parameters; callee saves registers >and allocates local space. One can do better than this by a factor of >two or three, on almost any machine I know, with a different model more >amenable to local optimisation. Would you care to elaborate on how to gain a factor of two or three? I'd like to see an example or two.. -- baum@apple.com (408)974-3385 {decwrl,hplabs}!amdahl!apple!baum
earl@wright (Earl Killian) (11/04/88)
In article <960006@hpcllla.HP.COM>, daryl@hpcllla (Daryl Odnert) writes: >Some systems use a mixed strategy. For example, both the HP Precision >Architecture (HPPA) and the MIPS R2000 split up the register set into >two partitions, a caller-saves set and a callee-saves set. Most people don't realize it, but so does 4.3bsd on the VAX. r0-r5 are caller saves, r6-r11 are callee-saves. Some VAX compilers (but not 4.3bsd cc!) take advantage of this, with good effect. --
cprice@mips.COM (Charlie Price) (11/04/88)
In article <960006@hpcllla.HP.COM> daryl@hpcllla.HP.COM (Daryl Odnert) writes: On callee save versys caller save for registers, Daryl gives the reference: >"Minimizing Register Usage Penalty at Procedure Calls" by Fred C. Chow >of MIPS Computer Systems. It is published in the "Proceedings of the >SIGPLAN '88 Conference on Programming Language Design and Implementation" >(pg 85-94.) > >"Register Windows vs. Register Allocation" by David W. Wall of >DEC Western Research Lab. Same conference proceedings (pg 67-78). It is probably worth mentioning that these proceedings are published as: SIGPLAN NOTICES Volume 23, Number 7 July, 1988 -- Charlie Price cprice@mips.com (408) 720-1700 MIPS Computer Systems / 928 Arques Ave. / Sunnyvale, CA 94086
cantrell@Alliant.COM (Paul Cantrell) (11/04/88)
I'd like to make some minor comments on a really good article by Robert Firth on register save procedures across procedure calls. Having programmed several 680x0 systems with registers-saved-by callee, and now working on a 680x0 architecture which has the caller save it's own registers, I've had the chance to program the same instruction set with both conventions used. In article <7580@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes: >First, and most important, if you are designing a professional-quality >production compiler, this is the wrong question. Such a compiler must >perform interprocedural optimisation if it is to be respectably state >of the art. > >However, if you want to design a prototype, amateur, or deliberately >low-cost compiler, the issue is probably one worth considering. To >keep this note short, I'm going to assume you understand the basic >issue and are familiar with current hardware and software technology. Well, you may have slightly overstated this - I'd guess that 98% of the production quality compilers available today do not do interprocedural optimizations. However, I agree that this is desirable. >C. Which is more efficient? > >Happily, however, the efficiency arguments, in my experience, support >the "caller saves" strategy, so one can indeed do well by doing good. > [he goes on to describe the longjump case as being more efficient when caller saves] I would tend to ignore longjump since this is an infrequently used mechanism compared to procedure calls in general. I think the efficiency of the basic call/return is what needs to be looked at here. He strongly argues that I shouldn't feel that way, but I'll leave it at that. >In my experience, there is almost no difference between the number >of registers used by the caller and the number used by the callee. > >Small procedures tend to use fewer than less small, and leaf procedures >tend to be a bit smaller, so on balance it seems marginally better for >the callee to save. (What this also tells us is that interprocedural >optimisation of leaves and leaf-callers only will give you big returns) Yes, this is one problem I have with caller saving - it substantially increases the cost of calling small procedures that need very few registers. The register save restore done by the caller can easilly outweigh the entire cost of the procedure itself, if it is something simple like a queue manipulation or an assembly language routine which gives you access to a special instruction. I don't think the word 'marginal' applies here - from doing code inspection I think this can account for a lot of wasted time. As you point out, it simply argues strongly for interprocedural analysis. (An obvious thing to do for such simple leaf procedures is to inline them, and get rid of the procedure call overhead entirely). A nasty side effect of our compiler (you could argue that this is simply a bug in the register allocation, but I think it's a little more complicated than that) is that for small C routines, adding 'register' statements may actually slow the code down by causing many save/restores to be generated. This obviously is impacted by where the variable is used, how often, and where the procedure calls are in relation to usage of the register variable. My only point is that the programmer expectes that adding 'register' to those variables which are used frequently should make his code run faster, not slower. In the callee saves convention, it is usually trivial for the program to determine whether 'register' is called for - it is almost certainly based on how many times he uses the variable within the procedure. But for caller saves, it is almost impossible for him to tell. >But this is outweighed by two factors > >* The callee must save all registers it will use throughout the body; > the caller need save only the registers that are live at the point of > call. > >* When two or more calls occur in succession, both callees must save, > but the caller need save only once. From code inspection of typical C code, the first point doesn't seem to be much of a win or loss, it's true that only the live registers need be saved if caller is saving, but in 'good' C code there are typically always enough registers in use (if the compiler has done a decent job of register allocation) such that you always end up saving a large number of registers. The second point that you can avoid multiple save/restores when you have several procedure calls in a row is certainly true, but again, the code inspection I have done shows that a fair amount of the time you end up doing all the save/restoring on each one because of conditional branching making the path through the calls unpredictable at compile time. However, this sometimes can be a large win - I suspect that this is the single largest reason that you can expect a performance gain with caller saving. Anyway, here is a list of what I consider the pros and cons of caller saving his own registers: Pros: 1) Avoids multiple save/restore operations across consecutive procedure calls. 2) Saved register state is local to owner, not buried on the stack by the various called procedures. 3) Only 'live' registers need be saved 4) If a copy of the data exists and is easy to obtain, no save need be done. Cons: 1) Often causes more saves than required when calling leaf procedures since they are small, but this is the most common operation so the penalty becomes large. 2) Makes programs slightly larger. Instead of one copy of the register save/restore, there has to be a copy at every invokation. This may have performance impact because of cache size, main memory size. However, Pro#1 may decrease the impact of this some. 3) For assembly language programming, code may be slightly harder to write and understand since determining which registers must be saved/restored depends on how the thread of control can be affected by conditional branching, etc. Typically, with the callee saves convention, the registers would be saved/restored at entry/exit time (I'm gonna get flamed on that one). Conclusion: Neither convention seems to be all that much better. I'd say that caller saving has a slight edge performance wise, callee saving has a slight edge in terms of readability/maintainability (only if you are using assembly language). I think interprocedural analysis would be enough of a win over either of these two methods that it strongly argues for people to move in that direction. PC
hankd@pur-ee.UUCP (Hank Dietz) (11/05/88)
In article <1009@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes: > In article <7580@aw.sei.cmu.edu>, firth@sei.cmu.edu (Robert Firth) writes: ...[stuff ommitted]... > > Which is better - caller saves or callee saves? > > > > > > A. Is this the right question? > > It is, if library subroutines are to be used. A compiler cannot change > the procedure to be followed in this case. ...[much more stuff ommitted]... This is actually a prime argument for caller saves. The reason is simple: the compiler optimizations, if constrained to be performed without changing the code of the called (library) routine, can only be applied in the caller. Hence, you want to do as much of the call processing as possible in the caller, because that's the only way you have a shot at optimizing it. It turns out that this is actually near optimal, because even though the compiler can't change the code inside called (library) routines, the compiler can access summary information specifying things like which registers are actually used in the called routine... essentially the "right" way to do it. -hankd@ee.ecn.purdue.edu
pardo@june.cs.washington.edu (David Keppel) (11/05/88)
cantrell@alliant.Alliant.COM (Paul Cantrell) writes: >[ function inlining avoids leaf-register-allocation problems ] >[ also saves procedure call costs! ] If the leaf procedure is called often and the hardware has an I-cache, then inlining the leaves may make the code *slower*. At an extreme (cache-miss penalty is high and sequential instructions are not prefetched), callee saves might win simply on the basis of code and resulting miss-rate penalty. ;-D on ( The walking virus ) Pardo -- pardo@cs.washington.edu {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
cprice@mips.COM (Charlie Price) (11/05/88)
In article <1009@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >In article <7580@aw.sei.cmu.edu>, firth@sei.cmu.edu (Robert Firth) writes: >> Register Saving across Procedure Calls >> -------------------------------------- >> Which is better - caller saves or callee saves? >> >> A. Is this the right question? > >It is, if library subroutines are to be used. A compiler cannot change >the procedure to be followed in this case. Not necessarily. MIPS has ucode libraries (an intermediate representation) available and compiling C with -O4 optimization selected uses these libraries and does interprocedural register allocation including the library modules. For some applications this is worth doing. On the other hand, what about shared-library routines with dynamic linkage at runtime? -- Charlie Price cprice@mips.com (408) 720-1700 MIPS Computer Systems / 928 Arques Ave. / Sunnyvale, CA 94086
hutchson@mozart.uucp (Stephen Hutcheson) (11/05/88)
In a previous incarnation, I was an interested party (compiler developer) to a project that defined a calling sequence for a new architecture. The various arguments about code size and information available were bandied about, to little effect. The final decision was to let the caller save. We had (and hoped to mix) code in several languages, including more than the normal percentage of assembly-language code, some of it very crufty. It was observed that if the caller saved his own context, the callee could not easily foul it up. The older architecture used callee-saves, and clever callees often didn't save. Half-clever callees didn't save as often as they should have, and it had been a recurring problem. The new "defensive driving" approach would have that problem only within a subroutine; the old approach had it across subroutines.
joe@modcomp.UUCP (11/05/88)
firth@bd.sei.cmu.edu (Robert Firth) writes: > Which is better - caller saves or callee saves? > [...] > Which is more efficient? > [...] > The most blatant case is that of the longjump, which appears in other > languages as a GOTO or RAISE statement. This causes a jump out of a > procedure to somewhere further up the call chain, and so must reset the > environment of the destination. If the caller saves state, then this is > simple: the jump is a jump, and the destination knows where all the state > has been saved. In most implementations, one need only reset the frame > pointer to the current incarnation of the destination procedure, and take > the jump to the label. > > But if the callee saves, then the caller has no idea how to recover his > saved state, which may be buried any number of stack frames further down. > It is therefore necessary to unwind the entire stack before taking the jump. > The difference in cost can easily be a factor of 100 or more. ^^^^^^^^^^^^^^^^^^^^^ The callee-save implementations that I have seen all have a fast longjump mechanism. Typically, the setjmp(x) call saves (an adjusted version of) the entire machine state in x, and longjmp(x) jumps simply by restoring that state. No attempt is made to depend on information which may or may not be on the stack after the setjmp call. Joe Korty I'm suffering from a virus ... uunet!modcomp!joe ... but my machine isn't.
aglew@urbsdc.Urbana.Gould.COM (11/06/88)
>I haven't seen anyone mention the idea of mixing caller-saves and callee-saves >methods: the caller hands to the callee a mask of live registers; the callee >ANDs this with a mask of the registers it uses and saves the registers whose >corresponding bits are set. The mask the callee hands to the routines it calls >is the OR of these two masks. (I hope that's clear). > > Amos Shapir amos@nsc.com >National Semiconductor (Israel) P.O.B. 3007, Herzlia 46104, Israel >Tel. +972 52 522261 TWX: 33691, fax: +972-52-558322 >34 48 E / 32 10 N (My other cpu is a NS32532) There has been a paper on this; sorry, memory fails. The best argument I heard against this sort of operation is that it makes high performance instruction dispatch difficult - you can't dispatch instructions after the SAVE-MASK until the mask has been computed, because you don't know which registers are used. In general, instructions that do not have static register addressing imply an instruction dispatch stall. This is, of course, not a question on the current generation of microprocessors, which do not really use any advanced instruction dispatch techniques.
henry@utzoo.uucp (Henry Spencer) (11/06/88)
In article <3473@pt.cs.cmu.edu> bsy@PLAY.MACH.CS.CMU.EDU (Bennet Yee) writes: >It's interesting to examine the ACIS implementation of longjmp/setjmp for >the IBM RTs. The standard procedure call convention is callee-save, and >longjmp does NOT unwind the stack. Contrast this with the Vaxen BSD >implementation of longjmp/setjmp, which DOES unwind the stack. Vaxen BSD, >of course, uses callee-save too. What is the difference? Well, for the IBM >RTs, your registers have the same values as when they returned from the >setjmp. On Vaxen, your registers have the same values as they had when you >called the next function from within the same function that called the >setjmp. So depending on one or the other behaviour for your register >variables is not safe. It's a minor but significant semantic difference. >[Anybody know what POSIX decided for this?] X3J11, not POSIX, is the relevant group here. And X3J11 has wimped out on it, in a big way: the values of *any* local variables (not just register variables -- the compiler may be quietly promoting things into registers!) that have changed since the setjmp are *indeterminate* after a longjmp, unless the variables are declared "volatile". Note, there is no guarantee that you get *either* of the above cases! The values may even be trash! Some of us thought this was a damn stupid idea, since it invalidates essentially every existing program that uses setjmp/longjmp, but we were unable to convince X3J11 of this. They claim that this is a "quality of implementation" issue. >Now, how to avoid unwinding the >stack and still retain the same semantics? It's actually not hard -- given >that, for those ``other'' languages at least, GOTO and RAISE are part of the >language, the compiler can just always save the contents of register >variables before calling other functions _only for those functions that >contain GOTO or RAISE_, and restore the registers variables when the >exception occurs... > >Of course, it's hard to argue a similar case for C, since setjmp/longjmp is >NOT part of the language... Au contraire, it *is* part of the language. Realistically, the major C library functions have to be considered part of the language. Most every C implementor has cursed this fact, since it puts significant constraints on calling-sequence design. X3J11 has put enough constraints on the invocation of setjmp, in fact, that the compiler can do the same sorts of things as it could for a language with built-in setjmp/longjmp. This is important, because it's the only sane way to handle setjmp/longjmp if you are doing fancy register allocation and can't do a stack unwind. (Fancy register handling means non-register variables, which most users expect are safe, are in fact in danger. Stack unwinding relies on the stack layout being either invariant or self-describing [pdp11 and vax respectively], and is not possible with efficient calling sequences on most modern machines.) -- The Earth is our mother. | Henry Spencer at U of Toronto Zoology Our nine months are up. |uunet!attcan!utzoo!henry henry@zoo.toronto.edu
csmith@mozart.uucp (Chris Smith) (11/07/88)
In article <7580@aw.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: > Which is better - caller saves or callee saves? Convex computers use caller saves; here are a few more observations to toss in. > B. Are the strategies equally sound? > The point I consider most important, is that there is a definite > semantic asymmetry between the two strategies. If the caller saves, > then the caller is saving, locally, his own local state. It's worth noting that this also gives the caller a chance to do a free "context switch" of register contents -- a burst of register-memory traffic is inevitable no matter who does the save, but putting it in the caller allows him to capitalize on the opportunity to load up a different -- more useful -- set of values after the call. > Rough guesses I have accumulated over time are > > * at any call point, caller is using ~ 2/3 of the registers it will > use at all (though this is partly due to defects in register > allocation strategies) > > * on average, a procedure call is (almost) immediately followed by > another about 2/3 of the time. This implies that if the caller > saves, it will have to save ~ 45 times for every 100 calls. > > These two factors together imply that the cost of a caller-saves protocol > is about 1/3 that of a callee-saves protocol. (Do you believe that?) Dynamically, our registers tend to be fuller than that -- they fill up at the drop of a hat anyway, but if they don't, loop unrolling sees to it that they do. But this one: > * the register may be slaving a known value. The caller then need not > save at all, merely restore. I find this is true of at least 20% of > live registers. operates very powerfully on a register machine, where loads are required for every memory operand. All in all, on this machine and on *one* benchmark, the Fortran validation tests, it looks like caller-saves is just under half the cost of callee-saves. (Counting only the registers saves and restores.) One other point: debuggers are up to hunting through saved frames to find a variable allocated to R4, but when the variables flit around as they are prone to do when the domain of register allocation is the intervals between calls, it puts quite a strain on the debugger tables.
dave@micropen (David F. Carlson) (11/08/88)
In article <2557@munnari.oz>, jkjl@munnari.oz (John Lim) writes: > > Not too important you might think, but i remember that M'soft used the pascal > calling convention in Windows to save 5% (if i remember) of code, which is > similar in principle to the caller/callee argument. > > Luckily, this isn't so much of an issue when you arent confined to 640K > of mem... > > john lim I thought Microsoft did this because they had source to Apple's Lisa, which is the object of current litigation, and Lisa used pascal calling conventions because it was in pascal. It made "duplicating" the Apple easier given that 1:1 transfer of their Macintosh software that already used the pascal conventions (M-Word, etc.) Apple owns several Trademarks on the above. Microsoft owns several Trademarks on the above. -- David F. Carlson, Micropen, Inc. micropen!dave@ee.rochester.edu "The faster I go, the behinder I get." --Lewis Carroll
johnl@ima.ima.isc.com (John R. Levine) (11/08/88)
In article <578@micropen> dave@micropen (David F. Carlson) writes: >In article <2557@munnari.oz>, jkjl@munnari.oz (John Lim) writes: >> >> Not too important you might think, but i remember that M'soft used the pascal >> calling convention in Windows to save 5% (if i remember) of code, ... > >I thought Microsoft did this because they had source to Apple's Lisa ... >[which was written in Pascal.] Probably not. A highly reliable source who wrote a lot of the Windows code tells me that Windows is written in C and assembler. Considering that the MS C compiler emits special code for Windows linkages and the Pascal compiler doesn't, I believe him. The main difference between C and Pascal calling sequences on an 8086 is that the C sequence has the caller pop the arguments off the stack, while Pascal has the callee pop. There is a "return and pop N" instruction that favors callee pop, thus a code saving. Naturally, if the caller and callee disagree about the number of arguments passed, chaos ensues. Pascal also passes arguments left to right while C passes them right to left; except for the rare varargs function this makes no practical difference. When I was working on Javelin, we also went from C to Pascal calling, and also noticed about a 5% space saving. In that case, though, the compiler returned pointer values in the ES:BX or BX, depending on size, which was a big win because the BX register can be dereferenced directly, while the AX (the normal value register) can't. To return to the original point, before that we switched from Lattice C, which was caller-save, to Wizard (father of Turbo) which was mixed, the callee saving SI and DI and the caller saving anything else. The mixed convention definitely saved a little space, though since the save and restore instructions are only a byte apiece it wasn't much. -- John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869 { bbn | spdcc | decvax | harvard | yale }!ima!johnl, Levine@YALE.something Disclaimer: This is not a disclaimer.
srg@quick.COM (Spencer Garrett) (11/09/88)
In article <234@taux02.UUCP>, amos@taux02.UUCP (Amos Shapir) writes:
- I haven't seen anyone mention the idea of mixing caller-saves and callee-saves
- methods: the caller hands to the callee a mask of live registers; the callee
- ANDs this with a mask of the registers it uses and saves the registers whose
- corresponding bits are set. The mask the callee hands to the routines it calls
- is the OR of these two masks. (I hope that's clear).
Problem is, the mask thus generated quickly tends toward all 1's if
it's saving any stores, and is equivalent to callee-saves if it isn't.
jjw@celerity.UUCP (Jim ) (11/10/88)
In article <6800006@modcomp> joe@modcomp.UUCP writes in response to firth@bd.sei.cmu.edu (Robert Firth): >The callee-save implementations that I have seen all have a fast longjump >mechanism. Typically, the setjmp(x) call saves (an adjusted version of) the >entire machine state in x, and longjmp(x) jumps simply by restoring that >state. No attempt is made to depend on information which may or may not be >on the stack after the setjmp call. Of course, this is the mechanism which results in the situation described by bsy@PLAY.MACH.CS.CMU.EDU (Bennet Yee) in article <3473@pt.cs.cmu.edu>: > ... your registers have the same values as when they returned from the >setjmp. On Vaxen, your registers have the same values as they had when you >called the next function from within the same function that called the >setjmp. So depending on one or the other behaviour for your register >variables is not safe. It's a minor but significant semantic difference. This is more than a minor difference since optimizing compilers will keep any variable in a register when necessary. henry@utzoo.uucp (Henry Spencer) in article <17965@utzoo.uucp> points out one solution: > ... X3J11 has wimped out on >it, in a big way: the values of *any* local variables (not just register >variables -- the compiler may be quietly promoting things into registers!) >that have changed since the setjmp are *indeterminate* after a longjmp, >unless the variables are declared "volatile". Note, there is no guarantee >that you get *either* of the above cases! The values may even be trash! As Harry Spencer stated this "invalidates essentially every existing program that uses setjmp/longjmp." It is possible for the information saved in the "jump buffer" to indicate where the register information is being saved on the stack so that it can be restored by longjmp. This is the mechanism used in the FPS Model 500. There are some difficulties with signal handlers using longjumps since the callee could be in the middle of saving the state when the signal occurs. The use of setjmp does require that the compiler generate some additional code on calls (especially if signal handlers can perform longjmp's) so there are compiler optimization options which do not guarantee "non-volatile" variable contents.
jjw@celerity.UUCP (Jim ) (11/10/88)
>In article <234@taux02.UUCP>, amos@taux02.UUCP (Amos Shapir) writes: >I haven't seen anyone mention the idea of mixing caller-saves and >callee-saves methods: the caller hands to the callee a mask of live >registers ... One "problem" with this is that managing and testing the bits in the mask can cost more than just saving the registers. This is defineitely the case in the FPS Model 500 where the registers can be saved on the register stack at a cost of a machine cycle each. This is what we refer to as a "smart cycle" problem -- you can spend more cycles trying to be clever than it costs to do it the "dumb" way.
andrew@frip.gwd.tek.com (Andrew Klossner) (11/11/88)
>>> Which is better - caller saves or callee saves? >>> Is this the right question? >> It is, if library subroutines are to be used. A compiler cannot change >> the procedure to be followed in this case. > This is actually a prime argument for caller saves. The reason is simple: > the compiler optimizations, if constrained to be performed without changing > the code of the called (library) routine, can only be applied in the caller. It is also a prime argument for callee saves. When the library routine is being compiled, the compiler optimizations are constrained to be performed without changing the code of the calling routine. The choice between optimizing routines near the frontier of the call graph and optimizing routines back toward the root of the graph should (IMHO) be made in favor of the frontier. -=- Andrew Klossner (uunet!tektronix!tekecs!frip!andrew) [UUCP] (andrew%frip.gwd.tek.com@relay.cs.net) [ARPA]
hankd@pur-ee.UUCP (Hank Dietz) (11/12/88)
In article <10600@tekecs.TEK.COM>, andrew@frip.gwd.tek.com (Andrew Klossner) writes: > >>> Which is better - caller saves or callee saves? ... > >> It is, if library subroutines are to be used. A compiler cannot change > >> the procedure to be followed in this case. > > > This is actually a prime argument for caller saves. The reason is simple: > > the compiler optimizations, if constrained to be performed without changing > > the code of the called (library) routine, can only be applied in the caller. > > It is also a prime argument for callee saves. When the library routine > is being compiled, the compiler optimizations are constrained to be > performed without changing the code of the calling routine. The choice Not true! The fact that while compiling the library routines you can't change the callers is not an arguement for callee saves. The reason is that no knowledge of the caller is available when compiling the callee (the callee being the library routine), whereas complete info is available about the callee when compiling the caller. It is the availability of information about the routine you can't change which makes the best optimizations possible, and since library routines are generally callee routines which predate the callers, you can't win with callee saves. -hankd
chris@mimsy.UUCP (Chris Torek) (11/13/88)
In article <196@celerity.UUCP> jjw@celerity.UUCP (Jim) writes: >It is possible for the information saved in the "jump buffer" to indicate >where the register information is being saved on the stack so that it can be >restored by longjmp. This is the mechanism used in the FPS Model 500. This is probably the best approach. It does affect optimisation, obviously, but at least at the moment, calls to setjmp() are rare; not too many functions should have much trouble here. >There are some difficulties with signal handlers using longjumps since the >callee could be in the middle of saving the state when the signal occurs. This is not hard to solve: Save all registers in memory before entry to setjmp(); have setjmp() note (in some fashion) when it is done saving state; and have longjmp() check to be sure the state is done, and if not, restore nothing. The last will succeed since there are no live registers around the setjmp call itself. >The use of setjmp does require that the compiler generate some additional >code on calls (especially if signal handlers can perform longjmp's) so there >are compiler optimization options which do not guarantee "non-volatile" >variable contents. This is certainly acceptable. The situation is much worse in GCC, which decides which variables should be placed in registers, and adheres to the letter of X3J11 by guaranteeing only volatile variables. Under `-traditional', GCC attempts to accept old PCC-based code, to the extent of turning off the `volatile' keyword, but NOT to the extent of not promoting variables into registers, so there is no way to guarantee any local variable! -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
mangler@cit-vax.Caltech.Edu (Don Speck) (11/17/88)
In article <10723@cup.portal.com>, bcase@cup.portal.com (Brian bcase Case) writes: > Oh, yeah, I forgot: smaller code is faster code. Sigh. On a machine with only a 4KB direct-mapped cache, this is TRUE. My vax-750's spend 30% of their cycles in MEM STALL. (The back cover has a chart that shows where to find this pin). That's about one cache miss per instruction! Has anybody tried putting larger cache RAM chips into a vax-750?