[net.arch] Risc Over-blown

fostel@ncsu.UUCP (11/29/83)

    I agree that there have been too many RISC papers, saying too little.
    Clearly, they have a good idea.  Clearly it will eventually prove its
    worth.  Perhaps other designs will try to use this neat idea.  I think
    the important step is not going to be cpus with reduced counts of
    instruction types, but rather, with reduced effort to produce a "nice"
    instruction set.  In another decade or so human written asm code will
    finally, really be gone.  Too much emphasis is still placed on the needs
    of the human coder.  Compilers are quite content to produce boring, ugly
    code that happens to be fast on a particular CPU.

    One question I have about the RICS, is whether all the space taken by the
    registers might not have been more profitably used as a high speed cache.
    Again, with a very highly restricted instruction set.  As a cache, some
    portion might be used to hold "stack-like" data as in the RISC, but some
    might also be used for the 15 instructions of a tight loop.  My intuition
    is that a cache would be more effective.
    ----GaryFostel----

henry@utzoo.UUCP (Henry Spencer) (12/04/83)

Gary Fostel asks whether it would be good to use some of the RISC's
on-chip register RAM for an instruction cache instead.  As far as I
know, the Berkeley folks have always assumed that a production RISC
would have not only the register stack but *also* an instruction cache.
In fact I've seen at least one recent paper from them on the design
of a separate cache chip, also incorporating expansion of a "tighter"
instruction format into the RISC's rather bulky instructions.  Looks
good to me.

He also suggests that the important point of the RISC is not simplicity
but the firm intent that all code be compiler-generated.  True in some
ways, but if you abandon the simplicity constraint you get a rather
more complex machine, much more like the Stanford MIPS project.  One of
the major points of the RISC concept is that a simpler design lets you
invest more resources in making the hardware fast (since the "basic"
resource requirements are much lower).  "Resources" here means things
like chip area, which are not subject to massive expansion by simply
waving a magic wand, so the simplicity really is a major win.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

bhyde@inmet.UUCP (12/10/83)

#R:ncsu:-241700:inmet:2500008:000:1592
inmet!bhyde    Dec  5 11:31:00 1983

I thought the problem was to consume all those transistors that the
VLSI people have to burn.  Friends of mine build systems that have
enough transistors to implement a dozen vaxen.  Seeing as a fine grained
multiprocessor is such a pain, wouldn't a complex processor with a
compact rich format be a reasonable place to spend the chip area?

It would seem this is realy a question of where one spends the transistors.
The speed issues seem unclear to me, the time to get on and off chip
usually provides enough time for a few microcycles.  There are lots of
things to spend transistors on, memory mapping caches, instruction stream
caches, decoded instruction caches, stack caches, microcode, registers,
capablity addressing, memory, memory, memory.  Some  of these, particularly
the memory mapping cache have very high bandwidth.  If the number of transistors
was very very large, so the incremental cost of the next thousand transistors
was very low it would seem that one would always spend them on increased
complexity, reliablity, etc.  The RISC designs seem to me to be a design
who's moment in history was about 6 months long, it was that period when
the tranistors that bought a 6809 could by a 32 bit RISC instead.  Now that
we have so many more transistors the problem is how to spend them.

There arn't that many chips in the microvax implementation that is in
the market today.  There were how many, 3?, years between this stage in
the PDP-11 line and the single chip stage?  Then given that the cost
difference was only a small factor would you by a RISC or a VAX?
     ben hyde

stevel@haddock.UUCP (12/11/83)

#R:ncsu:-241700:haddock:9500002:000:838
haddock!stevel    Nov 30 15:04:00 1983

One of the features of RISC that I have not seen mentioned
is that it is much easier to write CORRECT compilers for them.

There are not so many instruction interations to screw up on in a
RISC chip. If you do good flow analysis at the intermediate code
stage it should then be relativly easy to write code generators
for different RISC implimentations.

This allows CPU designers to change technologies (NMOS to ECL to
GaAs) and take advantage of gate implimentation considerations to
modify the instruction set.  The new code generator would be
fairly easy to create and then the whole system would come up and
run.  It would only take about 1 to 1.5 man years to get the
system software ported (i.e. UNIX) instead of having to spend the
current 3-5 man years.

Steve Ludlum, decvax!yale-co!ima!stevel, {ucbvax|ihnp4}!cbosgd!ima!stevel

howard@metheus.UUCP (Howard A. Landman) (12/16/83)

In response to inmet!bhyde (ben hyde):

	I thought the problem was to consume all those transistors that the
	VLSI people have to burn.  ... wouldn't a complex processor with a
	compact rich format be a reasonable place to spend the chip area?

	... lots of things to spend transistors on, memory mapping caches,
	instruction stream caches, decoded instruction caches, stack caches,
	microcode, registers, capablity addressing, memory, memory, memory.

	If the number of transistors was very very large, so the incremental
	cost of the next thousand transistors was very low it would seem that
	one would always spend them on increased complexity, reliablity, etc.

	... given that the cost difference was only a small factor would you
	by [sic] a RISC or a VAX?

Consider the Hewlett-Packard "wonder chip".  It has nearly half a million
transistors and is implemented in a 1-micron (!) techology that inherently
is 4+ times faster and 16+ times denser than the 4-micron technology that
RISC I and RISC II were implemented in.  I will not speak of the enormous
expense of its development.  And yet ... it is not faster than the RISC
chips for executing HLL programs.

Consider the 432.  Consider its multi-year $50,000,000 development effort.
Consider its performance.  Consider the 17 man-months needed to develop the
RISC I chip (maybe double that to account for the architectural studies),
which would have cost a company less than .005 as much, took less than .1
as long, and resulted in a better-performing chip.  Consider what could have
been done by the RISC designers if THEY had been given $50,000,000 and 5 years
in which to finish their design and turn it into a commercial product.  Or
$10,000,000 and 1 year.  Or $5,000,000 and 6 months.

Now think about these things very carefully and ask yourself why your
conclusions are wrong.  It may take several months for you to fully discard
the wrong assumptions that led you to them.

When RISC I was being designed, we asked some very simple questions.  Q: What
do we want to do with computers? A: Execute HLL programs. Q: What's a good
current architecture for doing this? A: VAX 11/780 seems quite reasonable.
Q: What limits the performance of the VAX? A: Memory bandwidth. Q: What is
using up most of the VAX's memory bandwidth? A: It turns out that procedure
calls and returns account for almost half of the memory traffic. Q: How could
something faster than a VAX be built? A: Build hardware support for calls and
returns, cutting memory accesses in half.

etc.  What I am trying to say is that you have to carefully look at the issues
and spend chip real estate on things that will improve the price/performance
of the entire system.  For example, the importance of code density is related
to the cost of memory and the extent to which instruction fetching memory
accesses limit performance.  So it may be VERY important, or it may be
unimportant, depending on architecture and RAM chip prices.

Complex instruction sets and complex architectures require complex control.
All commercial microprocessors ever produced use 50-75% of the chip for
control.  RISC I used 6-8% for control.  For a given chip size, this implies
that a RISCy architecture would get to use 2 to 4 times as many transistors
for all the wonderful performance-enhancing additions you mentioned above
than would a complex architecture.  Not to mention that the designers would
be spending less time on implementing the basic architecture and more time
on how to make it run fast.  So the question is, would you rather spend
silicon on adding more instructions to your instruction set, or on speeding
up the execution of the instructions you have?  EVERY analysis that we did
in the RISC design indicated that the former was pretty much useless and the
latter was highly desirable, once the instruction set passes the point where
it is adequate to support HLLs and operating systems.  Many people still feel
otherwise and design systems accordingly.

I do not mean to say that the RISC chips are the last word in microprocessor
architecture.  But I believe that any effort to do something better should be
asking itself some important questions: Q: What do we want to do with this
chip? A: Execute HLL programs. Q: What's a good current architecture for doing
this? A: RISC seems quite reasonable.  Q: What limits the performance of the
RISC? A: (?) Q: How can we solve this and go faster still? A: (?!)

As is usual for this field, there are many theses and/or dollars awaiting
those who answer these questions correctly.

	Howard A. Landman (just another RISC I designer)
	ogcvax!metheus!howard

ucbesvax.turner@ucbcad.UUCP (12/17/83)

#R:ncsu:-241700:ucbesvax:27900004:000:1405
ucbesvax!turner    Dec 17 03:19:00 1983

Not to quibble overmuch with Howard's defense of the RISC methodology
(hi howard!), but he neglects to mention that the HP "wonder chip"
outperforms RISC I by far in floating point computation.  The benchmarks
that showcase RISC I performance are all integer codes.

Much of the complexity of the HP chip springs from its implementation
of the IEEE floating-point standard.  The arithmetic algorithms are
not terribly involved, but they are carefully designed with a view
toward preserving accuracy, and representing odd cases (plus/minus
infinity/infinitesimal).  As we all know, program complexity is in
large part a function of boundary conditions--programs typically
spend a lot of time trying to decide whether or not to add 1 to
something.  (As do programmers, come to think of it :-)  Microcode
of this kind is not much different, I think.

A fair amount (~40%?) of the HP microcode is taken up by IEEE FP.  The
natural question is: if you put an IEEE FP implementation in RISC native
code in ROM on the RISC chip, and redesigned the timing to take advantage
of the proximity of this "floating point microcode", would you have
something faster than the HP chip?  Maybe not, but, for reasons given
previously, you'd probably have more chip-space to work with afterward.

As Howard points out, there are potential dissertations and gold in such
questions.
---
Michael Turner (ucbvax!ucbesvax.turner)

jlg@lanl-a.UUCP (12/20/83)

Someone has probably mentioned this before, but the really FAST mainframe
computers on the market today are all closer to RISC than to HP superchips
or any DEC product.  This is for the reasons described in the preceeding
notes.  To operate at really high speeds, a computer has to make use of
very high power, very simple chips.  VLSI isn't fast enough -- discrete 
components are used.  To build an 80MHz machine with an unRISCy instruction
set would require a room full of components, and a nuclear power plant to 
power it.  

And if you think that mainframes are on their way out -- the parallel
machines that replace them will be too hard to program and verify if
they aren't kept simple.

Of course, my main objection to unRISCy archetectures is that they glorify
to memory buss; you just can't do anything without going to slow memory.
With enough registers (and a pipelined archetecture so memory can be referenced
asynchronously) you can perform incredible ammounts of code without being
held up for memory once.  This problem would go away if memory were as fast
as registers, but memory is always made with older technology (for price)
while registers are always state-of-the-art (for speed).

		J.L. Giles 
		Los Alamos National Lab