[net.arch] What RISC is REALLY all about!

rb@cci632.UUCP (Rex Ballard) (07/08/86)

In article <481@elmgate.UUCP> jdg@elmgate.UUCP (Jeff Gortatowsky) writes:
>
>That's not what I meant.  I was just citing an example.  The same DBcc
>instruction could, of course be 3 separate CISC instructions as well, using
>32 bit quanities. If the CISC vs. RISC subject was the same as saying "the 
>68000 vs. RISC", I might better understand the hoopla associated with RISC. 
>Indeed most of the talk is comparing less well known mini or mainframe
>computers, not micros. One exception to this is the VAX.  It's classified
>as a mini (CISC) but is, of course, very well known.

Sure, RISC is not new, it has existed in larger archetectures for some
time, and provides a good source of background.

>In short I was not asking what is wrong with the 68000 CPU's.  Just whether
>RISC is REALLY an inprovement in computer design?  If so why?

Yes and no.  This is like asking if an ALL ROM operating system is better
than an all RAM operating system.  There are some tradeoffs.

Remember the early Apple IIs, TRS-80's, and Ataris?  These boxes, among
others had a large ROM operating system, command language, utilities, and
drivers.  This was primarily because these early boxes were cassette based
systems.

Now, with floppies and hard disks as part of the "minimum package", it is
less desirable to have the OS rommed in.  Even those who do have heavy rom
tend to use them as a library of primitives, rather than as the "final
system interface".

These same factors are now becoming significant at the CPU level.  Ram
cache is major factor, as are factors such as pipelining.  When it is
possible to get an order of magnitude faster "local storage" even in small
quantities, the costs and benefits become worth considering.

As you pointed out earlier, much of the RISC archetecture comes right
off the old minis and mainframes.  Anybody remember when a computer was
really powerful if it had a 4K "core" (magnetic beads), a large "drum",
and a tape drive?

Just as those old machines had special "controllers" which managed, loaded,
and stored data to/from these successively slower media, the RISC chips
tend to have sub-chip level "controllers" of their own.  Remember, CPUs
and "controllers" are just different flavors of finite state machines.

Fortunately, most of the mechanics has been hidden using modern hardware
and software techniques.  Again, mostly taken from minis and mainframes.

>It was always
>my feelings that, if a CPU manufacturer were to write the language compilers
>first, THEN generate a CPU design to run it, we'd all be alot happier.  Yes,
>that sounds like a CISC design. But, am I wrong saying that INMOS took 
>that approach with the TRANSPUTER?  Am I wrong in assuming the TRANSPUTER 
>is a RISC CPU?  If I'm not wrong then RISC vs CISC seemings like a useless 
>argument, as INMOS' product proves that the CPU should fit the langauge it
>runs, not the other way around. CISC or RISC notwithstanding.

One of the nice features of the TRANSPUTER is that the primitives used in
OCCAM could be used in other languages.  In addition, other primitives could
be added, changed, deleted, to make a super-fast 'C' machine, or a forth
machine, or prolog, or smalltalk, or lisp, or ??.

In some ways, RISC is a possible LOSE.  You have been looking at only the
functionality of a single instruction vs. three instructions.  In some
cases such as the "frame save" or "poly" or "context switch" operations,
it may even be necessary to add the overhead of a "call" instruction,
but the call instruction can also be easily and quickly optimized.

Most programmers today are "top down" trained, and not used to thinking
in terms of primitives.  RISC however, makes thinking in terms of primitives,
not only at the compiler level but at the project level, even more significant.

Here is where one begins to see the advantages of RISC.  When it becomes
possible to pack more and more "primitives" into the system, and have them
automatically arrange themselves to the most efficient configuration within
a few hundred cycles.  Things start to speed up at the application level.

Ironically, your DBcc example is interesting.  What would you put inside
the loop?  Suppose you could use 16 or 32 bytes of instructions inside the
loop.  Suppose in addition, that you wanted to loop 2000 times.  Now in
a CISC, you might want to put the DBcc in "ROM", but how about strncpy,
and the rest of libc.a?  Now since you only need at most a few hundred
bytes, you can afford to buy and use very FAST ram.  You could build
a little 4K cache and have about 5 layers deep worth of nesting, without
ever seeing (slowing down CPU with wait cycles) for a cache miss, because the
"pre-fetcher" is loading the next subroutine while your inside a 
"sister" subroutine's loop.  Not only that, but you've got more
time to prefetch from slower memory, because you are in a loop.

If you want to watch the RISC system get a little crazy, get a massively
large routine of say, 6K inside one massive loop, loaded with "macro
expansions" , and watch performance drop through the floor :-).

CISC on the other hand, can save you a few FETCH cycles.  Of course,
there is nothing to stop you from putting these RISC features in a
CISC, except time, money, chip and board space, effort co-ordinating
arbitrary length instructions with co-processors, extra delays for
microcode synchronization, data-path turn-around, microbus arbitration,
external bus arbitration,..... :-).

Seriously, there may be features of CISC that will need to be incorporated
into RISC.  But as these features are added, they will probably be done
via hardware, rather than micro-code.  Things like address calculations,
multiply and divide, TLB changes, and pre-fetchers may end up becoming
smarter as individual units.

silbey@rb-dc1.UUCP (07/17/86)

>It was always
>my feelings that, if a CPU manufacturer were to write the language compilers
>first, THEN generate a CPU design to run it, we'd all be alot happier.  Yes,
>that sounds like a CISC design. ...

On the contrary: one aspect of RISC design consists of taking a set of
instructions that can be easily generated by one *or more* compilers
and implementing those instructions in hardware.  Designing a
compiler/optimizer for a CISC is much more complicated in some ways,
because each instruction may have many side effects, and there may be
many ways to accomplish a function.

I think you'll find that most CISC machines provide HLL-like
instructions which are suitable only for a single language.  Take a
DO-loop type instruction, for example.  The instruction might be quite
well suited to Fortran66, but unusable for Fortran 77, because of the
difference in the specifications of the trip count for the loops.  So
providing the instruction hasn't gained you much compared to
synthesizing it from simpler instructions (except in special cases,
such as benchmarking).  And it has probably hurt you because you could
have used the control store space (or hardwired logic area) to
implement some function more universally useful.

The IBM 801 architecture is an example of an architecture designed in
concert with a compiler.  The targetable 801 compiler was used to
generate code for some System/370 benchmarks, and the resulting code
ran 50% faster than the best code turned out by previous 370
compilers.  What's more, the resulting code used only a small subset of
the 370 instruction set.

-------------------------------------------------------------------------
{cbosgd|ihnp4|pyramid|sdcsvax}!gould9!rb-dc1!silbey

Alex Silbey
Gould Inc., Computer Systems Division
15378 Avenue of Science
Rancho Bernardo, CA 92128
-- 

{cbosgd|ihnp4|pyramid|sdcsvax}!gould9!rb-dc1!silbey

Alex Silbey
Gould Inc., Computer Systems Division
15378 Avenue of Science
Rancho Bernardo, CA 92128

kds@mipos3.UUCP (07/18/86)

Actually, from my perspective, one of the main objectives of a RISC is the
simplification of the hardware so that you can fit more other things on
the chip, i.e., caches or whatever, that could potentially have a greater
impact on the overall system performance than your fancy-dancy instructions.
Certainly, you don't want to do anything that makes it impossible to generate
code, but in terms of what you put in or what you take out, I'd say that
the hardware of the implementation is the highest consideration.  Maybe I'm
very mistaken, but you'd have to work long and hard to convince me that
the removal of inter-pipeline interlocks, the addition of the deferred
jump feature or fixed length instructions simplify code generation or 
assembly language programming.  They are only there because they simplify 
the hardware required to take out jump penalties or they were something
that a compiler could "work around" and therefore wasn't required in
the hardware.

And my last word, my firm belief is that any processor designed without
any (or a very small) consideration to either the software that the
thing is going to run, OR the hardware implementation that is required
to execute the instructions is doomed to failure.  And this includes
processors that are designed by compiler writers living in a vacuum.
-- 
The above views are personal.

I've seen the future, I can't afford it...

Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California

jdg@elmgate.UUCP (Jeff Gortatowsky) (07/21/86)

In article <136@mipos3.UUCP>, kds@mipos3.UUCP (Ken Shoemaker ~) writes:
> And my last word, my firm belief is that any processor designed without
> any (or a very small) consideration to either the software that the
> thing is going to run, OR the hardware implementation that is required
> to execute the instructions is doomed to failure.  And this includes
> processors that are designed by compiler writers living in a vacuum.
> -- 
> The above views are personal.
> 
> I've seen the future, I can't afford it...
> 
> Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California


First a tongue-in-cheek remark to Ken on the above paragraph:
Where the heck were you when they were designing the 8088/86/186/286, hmmm?
Of course it was not doomed to failure.  But one can hardly say they are
compiler friendly 8-) !!   It must have been a perfectly airless
environment back in those days, aye?  8^)

Second a completely different topic:
Ken, with the success of the PC's, how much outside influence (if any) did
certain large(!) computer makers have on the design of the 386.  Even if
you were not directly involved, you must have some idea.  Further, did
your employer work with outside SOFTWARE developer's to find out what they
wanted and did it influence the design.

To everyone:
Does anyone know of a successful or failed CPU design that was
significantly changed or built from scratch, because of outside influence?


-- 
Jeff Gortatowsky       {allegra,seismo}!rochester!kodak!elmgate!jdg
Eastman Kodak Company  
<Kodak won't be responsible for the above comments, only those below>

srm@ucbiris.berkeley.edu (Richard Mateosian) (07/23/86)

>Does anyone know of a successful or failed CPU design that was
>significantly changed or built from scratch, because of outside influence?

The 68020 design was significantly changed along the way in response to 
the "outside influence" of what National and Zilog (among others) claimed
to be doing.

Richard Mateosian    ...ucbvax!ucbiris!srm 	     2919 Forest Avenue     
415/540-7745         srm%ucbiris@Berkeley.EDU        Berkeley, CA  94705