[net.arch] "another RISC" machine

baskett@decwrl.UUCP (05/20/84)

<>
While discussing Pyramid, Mark Weiser claims the Ridge 32 is not
"another RISC" machine because it doesn't have register banking.

The first machine I know of to have register banks that interacted
with procedure calls was the MAC-8, an 8-bit microprocessor done at
Bell Labs sometime in the dark ages of the the 1970's.  Then TI
picked up the idea in the TI 9900 microprocessor but blew the
implementation rather badly.  Dick Sites tried to get people back
on the track with a paper he gave at the first annual Caltech
Conference on VLSI, "How to use 1000 registers" in January
of 1979.  We did some experimental simulation studies on this
scheme at Stanford and I presented those results at a talk at
Berkeley in the fall of '79.  Patterson and company decided
to incorporate this idea into their then just starting
microprocessor design project.  I don't remember when they
started calling it the RISC project.  Pyramid was then influenced
by the subsequent work at Berkeley.

However, "RISC" is an acronym for "Reduced Instruction Set Computer".
The idea of reduced instruction set computers is mostly orthogonal
to the use of register banks that interact with procedure calls.
The RISC idea grew out of the many studies of static and dynamic
instruction frequencies done in the 60's and 70's, mostly on 360's
and 370's, that showed that the lion's share of the computing work
was done by a small subset of the simple instructions.  My favorite
example result was Len Shustek's showing that for speed of execution,
sequences of loads or sequences of stores were better than single
load multiple or store multiple instructions on the 370/168.  The
cross over point was 15 registers and the average number was 4!
Some of these studies were at least partially motivated by the
desire to understand how Seymour Cray could build such fast machines
while the rest of the world just poked along.  And by 1979 word
had already begun to leak out of the IBM Yorktown Heights 801
project that a very simple machine with little or no microcode
might be "the thing to do".

Thus we might say that the Pyramid machine is NOT a RISC machine
because it has moderate amounts of microcode rather than saying
that it is a RISC machine because it has register windows.
Likewise, the Ridge machine would not be a RISC machine because
it also has a moderate amount of microcode.  (Yes, we know that
the marketing departments at Pyramid and Ridge have already
asserted that their machines are RISC machines.)

Forest Baskett - Digital Equipment Corporation - Western Research Lab

mark@umcp-cs.UUCP (05/20/84)

	...Thus we might say that the Pyramid machine is NOT a RISC machine
	because it has moderate amounts of microcode rather than saying
	that it is a RISC machine because it has register windows.
	Likewise, the Ridge machine would not be a RISC machine because
	it also has a moderate amount of microcode.  (Yes, we know that
	the marketing departments at Pyramid and Ridge have already
	asserted that their machines are RISC machines.)...

Thanks for the clarification, Forest.  

Assuming a RISC machine needs a fast equivalent of a procedure call,
what are the alternatives to register banking?  Three things happen
at procedure call time: (a) the PC changes, (b) parameters must be passed,
and (c) new working storage is required.  The first is common to branches
I won't discuss it further (aside: Ridge has bits in its branch instructions
to help the instruction pre-fetcher guess which way to go.)
Overlapping register banking, as in the Pyramid and Berkeley-"RISC"
architectures, achieves (b) and (c) wonderfully.  Alternatives:

1. Expand calls in-line.  This adds to compile time and code space,
and doesn't work for external procedures.  By itself it does not
actually eliminate much overhead (because one must still simulate
by variable assignment the moving of parameters into and out of
the in-line procedures name space).  But coupled with a real good
global optimizer it could be a win.

2. Half way to in-line expansion is to compile-in working 
storage for called procedures in the calling procedure.
The pascal compiler of MDSI, written up 5 years ago in Transactions on
Software Engineering, used this trick for internal procedures.  The idea
is for the compiler to use the information that a certain procedure is 
only called by certain other procedures to improve their data sharing.
Specifically, the caller, when it is called, can pre-allocate any
working storage for the callee, so there is no overhead at call time
to the callee, and very little additional when the caller is called.
Furthermore, the compiler can be clever about where it puts the parameters
in the caller's space so parameter passing overhead is minimized and the callee
can get the parameters easily, possibly by reaching into the callers data
area.  Disadvantages of this scheme is that it only works if callers and
callees are simply related (no external procedures, please), and even
then often gets too complicated to work out.

3.  (this one is made up, I know of no implementations that do it)
Do a pipe-lined mini-bank switch.  The idea is that if
the processor could start the procedure call before it was needed
it could do the call in parallel with the last few instructions
before the call.  (Several machines have a delayed branch instruction,
in which the branch is taken an instruction or two after it is 
first seen in the instruction stream.  Are there machines that do this for 
calls?)  Only thing is, usually those last few instructions before the
call are getting ready for the call, and the call itself is mostly
moving data values around.  The bank-switch call permits pipelining
because it involves changing the machines internal memory mapping
so that after the call certain caller locations are secretly mapped
into certain callee locations.  This can happen in parallel with
placing values into those locations because the call instruction
doesn't move the data, just remap it.  This is a lot like register
remapping, as in SOME RISC machines, but may be less work for
the compiler at the expense of complicated hardware.  Since
the trend these days is to overwork compilers to simplify hardware,
it may not be a good idea.

What other alternatives to register bank switching are there to
speed up call instructions?
-- 
Spoken: Mark Weiser 	ARPA:	mark@maryland
CSNet:	mark@umcp-cs 	UUCP:	{seismo,allegra}!umcp-cs!mark

n0ano@asgb.UUCP (05/21/84)

I would like to point out that decwrl!baskett make one mistake
when talking about the use of register banks by the TI9900 and
the Bell Labs MAC 8.  TI was the company that originally came up
with the idea and Bell followed suit, not the other way around.

Don Dugger
bmcg!asgb!n0ano

ron@brl-vgr.ARPA (Ron Natalie <ron>) (05/23/84)

I always felt that the CARDIAC was the best RISC machine I ever saw.

-Ron