alverson@decwrl.dec.com (Robert Alverson) (04/21/88)
After hearing about the 80960 for the last few days, I still have a few questions: 1. How are parameters passed into procedures? In Berkeley's RISC, this was accomplished with overlapping windows. What I have read seems to imply that the 80960 windows do not overlap. How about procedure return values? 2. Just how much extra delay does register windows cost? There may be extra decoding, or extra loading on lines, or extra time during call & ret (to switch register sets). Nothing is free. Bob
marc@oahu.cs.ucla.edu (Marc Tremblay) (04/21/88)
In article <385@bacchus.DEC.COM> alverson@decwrl.UUCP (Robert Alverson) writes: >After hearing about the 80960 for the last few days, I still have a few >questions: > >1. How are parameters passed into procedures? In Berkeley's RISC, this > was accomplished with overlapping windows. What I have read seems to > imply that the 80960 windows do not overlap. How about procedure > return values? Indeed the windows do not overlap. There are a few ways to get around this problem, eventhough it is not as efficient: 1) Parameters can be passed through global registers. Obviously problems occur when the depth of subroutine calls gets large (for example recursive calls). 2) Parameters can be pushed unto the stack (in memory) like in a conventional register scheme. 3) For a long list of parameters, a pointer to an argument list can be placed in a global register. 4) Finally, the 80960 provides an instruction (flushreg) which writes the contents of all the local register sets (in the register cache) to their associated stack frames in memory. This method could be used to pass parameters through-local- registers-of-the-caller. >2. Just how much extra delay does register windows cost? There may be extra > decoding, or extra loading on lines, or extra time during call & ret (to > switch register sets). Nothing is free. > In the Berkeley-like window schemes, the larger the number of windows, the longer the READ delay, this is due to a longer data bus which increases the load capacitance. Intel partly solves this problem by using a register cache. I do not have access to their layouts but I doubt that the internal data bus goes through the register cache. In this way they can increase the number of local register sets that can be saved on chip, without *directly* increasing the data bus. One important "indirect delay" introduced by adding more sets is related to the saving of those sets (done four words at the time!), having a larger register cache will increase the saving time. For a few more sets it may not even be in the critical path though. I also wrote a paper describing another method, I will send you the reference if you request if via e-mail. Marc Tremblay marc@CS.UCLA.EDU ...!(ihnp4,ucbvax)!ucla-cs!marc Computer Science Department, UCLA
bobdi@omepd (Bob Dietrich) (04/23/88)
In article <385@bacchus.DEC.COM> alverson@decwrl.UUCP (Robert Alverson) writes: >After hearing about the 80960 for the last few days, I still have a few >questions: > >1. How are parameters passed into procedures? In Berkeley's RISC, this > was accomplished with overlapping windows. What I have read seems to > imply that the 80960 windows do not overlap. How about procedure > return values? > ... >Bob In the 80960 C compiler, parameters are typically passed in up to 12 (G0-G11) of the global registers. If a function has more than 12 registers worth of parameters, the calling function allocates a parameter block on the stack and passes a pointer to it in G14. Note that some parameters may take more than one register, such as doubles and aggregates up to four 32-bit words in size. Return values are placed in G0-G3, depending on size. If a structure or aggregate is to be returned, the caller places a pointer to the return space in G13. Bob Dietrich Intel Corporation, Hillsboro, Oregon (503) 696-4400 or 2092(messages x4188,2111) usenet: tektronix!ogcvax!omepd!bobdi or tektronix!psu-cs!omepd!bobdi or ihnp4!verdix!omepd!bobdi