andy@Shasta.UUCP (02/25/87)
The sliding register window had an area for parameters passed to the procedure, locals, and parameters passed to other procedures. Calling lots of little procedures (that don't call other procedures) that don't use their window fully is equivalent to doing a better job allocating but having one less window. The first call might overflow, but subsequent ones won't. There were also a set of global registers. One argument for fixed window sizes is that register access shouldn't require an addition. Bit-wise or is one thing (or the register number with the current base to determine the register); addition is an expensive operation to put on the register access data path. This basically restricts window sizes to powers of two. (Then again, if there are cheap functions that one could use to implement variable size register windows, I'm sure someone on the net will mention them.) It then is probably easier to pick a power than support a range of them. Then again, I seem to remember that the window size wasn't a power of two on the Berkeley RISC, which surprised me. Maybe I'm wrong. -andy -- Andy Freeman UUCP: ...!decwrl!shasta!andy forwards to ARPA: andy@sushi.stanford.edu (415) 329-1718/723-3088 home/cubicle
bcase@amdcad.UUCP (02/25/87)
In article <1335@Shasta.STANFORD.EDU> andy@Shasta.UUCP (Andy Freeman) writes: >One argument for fixed window sizes is that register access >shouldn't require an addition. Bit-wise or is one thing (or the >register number with the current base to determine the register); >addition is an expensive operation to put on the register access >data path. This basically restricts window sizes to powers of two. > >Then again, I seem to remember that the window size wasn't a power >of two on the Berkeley RISC, which surprised me. Maybe I'm wrong. > True, the entire window in the Berkeley scheme (and Pyramid) isn't a power of two in size, but the amount allocated/deallocated on call/return *is* a power of two: 6 in, 10 local, 6 out = 22. But 6 of those are overlapped and aren't allocated on call, thus, 16 is the allocation size. Also, the Berkeley scheme requires more than a simple or function; see Katevenis' thesis. bcase --------- It'll be worth it. Keep watching this space.
johnw@astroatc.UUCP (02/26/87)
Its clear to me from the questions, that many people don't understand RISC. RISC stands for Reduced Istruction Set Computer. Features common to RISC machines: + One load/one store insturction. (All others are reg-to-reg.) + Regular (read: easy to decode) instructions uaually a single (or limited set of) instruction format(s) + usually a fairly small set of insturctions Additional features of the Berkeley RISC machine: + Register Windows + delayed branchs + Reg 0 is always 0 + a one-bit field in the instuction to set codition codes (This was a great feature, in my opionion) + 3 operands/instruction The Purpose of Register Windows was to remove the need for a cache. As a student project, a real cache design was deemed too hard. In practice, I think the Sun/3's and similar machines are proving that windows are not the "only-true-way" to get good performace. The division of registers: ------------------------- Reg 0 always 0 Reg 1-9 Global (static) registers Reg 10-16 (6) input parameters Reg 17-27 (10) locals Reg 28-31 (6) output parameters This 6/10/6 division of register is arbitrary, but is kept fixed to insure maximum speed. To make it dynamic, would require a size-register (effectively nargs() value) probably *IN* the window (otherwize it would not be possible to unwind the call-stack). The division of register (6/10/6) was selected carfully, after weeks of study on real code (see the RISC article in Computer, a few years back) - - - - - - - - - - - - - - As for dealing with register window over-flow, under-flow, the Berkeley paper (as I recall) said that it's occurence was not frequent enough to sweat about. As for "dribbling" registers in or out (at the ends of the window stack -- a circular queue with a dead-zone to represent the stack), would be fine for a small machine. Thought experiment: Lets build a *FAST* machine (I-cache, and DATA-cache) with windows, and a memory system that's wide enough to fill a full window of registers in one (memory access + split-transfer) time unit. Now what happens when you want to run a new process??? The state you must save is not just your current window, but your *WHOLE* stack of windows! Windows are great for running benchmarks, but are not so good for doing context switching. Anyone know how Pyramid solved this??? How many users (vi, csh, compiles, etc.) does it take to bring a Pyramid to its knees?? > >From: eppstein@tom.columbia.edu (David Eppstein) > Subject: register window machine questions > > (1) Has anyone tried making the window block size be just one register, > i.e. the window can have an arbitrary alignment in relation to memory? > I would expect this to be more efficient in terms of registers used (and > therefore also memory), but register access time might suffer. I don't understand this question! The instruction has bit fields to specify the register, so you can't go over the max-register number, and if you don't use all you're registers, you're wasing resources anyway! I would venture to guess that FAST machines spend less that 10% of there time running within 10% of their PEAK performance!! > (2) dirty regs and (3) staic/window --> see above > David Eppstein, eppstein@cs.columbia.edu, Columbia U. Computer Science Dept. > --------------------------------------------- > > >From: chuck@amdahl.UUCP (Charles Simmons) > Subject: Re: subroutine frequency > > Are there reasons I am missing that make fixed sized windows extremely > advantageous? Just that fixed it simple (fast), easy to do, and doesn't require alot of state to keep track of. If you can show how dynamic windows are better when you assume that memory *BANDWITH* is high enough, but its just a *LATENCY* problem we have to deal with, then I'll vote with you. NOTES: bandwith refers to thruput, or a long-term performace average. latency refers to responce, or the short-term time it takes to get what you ask for. It's frequently easy to trade bandwith and latency, but its hard to improve *BOTH* at the same time! John W - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: John F. Wardale UUCP: ... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw arpa: astroatc!johnw@rsch.wisc.edu snail: 5800 Cottage Gr. Rd. ;;; Madison WI 53716 audio: 608-221-9001 eXt 110 To err is human, to really foul up world news requires the net!
howard@cpocd2.UUCP (02/26/87)
In article <1335@Shasta.STANFORD.EDU> andy@Shasta.UUCP (Andy Freeman) writes: >The sliding register window had an area for parameters passed >to the procedure, locals, and parameters passed to other procedures. >There were also a set of global registers. > >One argument for fixed window sizes is that register access >shouldn't require an addition. Bit-wise or is one thing (or the >register number with the current base to determine the register); >addition is an expensive operation to put on the register access >data path. This basically restricts window sizes to powers of two. Actually we didn't even use bitwise manipulation, we just built double decoders for the overlapped registers so that two addresses would both activate the same register. You can easily see this in the chip photos, in the upper left, as light and dark bands in the regfile decoder. In retrospect I'm not sure that was the best method. Dave Johannsen once suggested to me an alternate window structure in which the overlap could be implemented with a "normal" RAM and only 2 extra gates! >Then again, I seem to remember that the window size wasn't a power >of two on the Berkeley RISC, which surprised me. Maybe I'm wrong. You are. In RISC-I, say, in each window you have 6 "high" and 6 "low" registers which overlap with the next window, 10 "local" ones which don't, and 10 "global" ones which are the same in all windows. That adds up to 32. However, the 10 global registers only exist in one place, not in each window, so each window has 22 registers which are non-global; this may be where you got your non-power-of-2-idea. On the other hand, adding an extra window costs 16 registers, because either the high or the low (I forget which) already exist in the previous window, so we're back to a power of 2 again. The numbers in RISC-II are different, but could be plugged into the preceding paragraph without making it untrue (I think). Let H = # of high registers, M = # of local registers, L = # of low registers, G = # of global registers, R = # of registers seen in each window, W = # of windows, P = # of physical registers. The following must hold for 2 or more windows, assuming circular overflow (top-window high = bottom-window low): H = L R = H + M + L + G P = W*(M+L) + G -- Howard A. Landman ...!intelca!mipos3!cpocd2!howard
bcase@amdcad.UUCP (02/26/87)
In article <161@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes: >Its clear to me from the questions, that many people don't understand RISC. Well, I am confident in *my* understanding. >RISC stands for Reduced Istruction Set Computer. True, this is true. But the choice of the RISC acronym was unfortunate. Reducing the number of instructions is incidental. I know that the Berkeley people emphasized this point in the beginning, but I think they (Patterson and Katevenis (yes, I know he is out of the country)) would no longer emphasize this aspect of simple machines. >Features common to RISC machines: >[a list of features] Sigh, RISC machines have whatever features are necessary to achieve an instruction rate of close to one per cycle and achieve a good compiler interface. >The Purpose of Register Windows was to remove the need for a cache. As a >student project, a real cache design was deemed too hard. In practice, I >think the Sun/3's and similar machines are proving that windows are not >the "only-true-way" to get good performace. Who ever said that windows were the only true way to get good performance? >This 6/10/6 division of register is arbitrary, but is kept fixed to insure No, not arbitrary. The amount allocated (in this case 16 regs) must be a power of two. >maximum speed. To make it dynamic, would require a size-register >(effectively nargs() value) probably *IN* the window (otherwize it would >not be possible to unwind the call-stack). Well, you can use dynamic links too. (This makes the protocol for variable sized windows more efficient.) >Lets build a *FAST* machine (I-cache, and DATA-cache) with >windows, and a memory system that's wide enough to fill a full window of >registers in one (memory access + split-transfer) time unit. Now what Wow. That's a wide bus, but I like the idea! Bandwidth or bust! >happens when you want to run a new process??? The state you must save is >not just your current window, but your *WHOLE* stack of windows! Windows >are great for running benchmarks, but are not so good for doing context >switching. Anyone know how Pyramid solved this??? How many users (vi, >csh, compiles, etc.) does it take to bring a Pyramid to its knees?? I worked at Pyramid, my thesis details a machine with variable sized windows, and I have build a simple machine with variable size windows (er, activation records that is). Windows are good for running benchmarks *and* real programs. Variable sized windows are just better. You are right, the entire active portion of the register file must be swapped on a full context switch. However, you are completely ignoring the fact that swapping the register file is only a part, a small part, of the work that must be done to perform a context switch. The number of users needed to bring a Pyramid to its knees is not correlated to the fact that it has register windows. To be sure, however, a real-time machine might be better off without windows (latency vs. throughput). Context switch overhead is mittigated by variable sized windows: at least the registers you are swpping are holding useful data. With fixed size windows, there is a high probability that some of the registers you are swapping are not being used. A smaller register file allowing variable sized windows achieves the performance of a larger register file with fixed sized windows. bcase --------- Are you watching this space?