[comp.arch] D-machine helped spawn RISC

eggert@sdcrdcf.UUCP (09/06/87)

In article <288@tropix.UUCP> mjl@tropix.UUCP (Mike Lutz) writes

	... the B1700 was a pleasure to work with at the microcode level (and
	anyone who has done serious microprogramming knows what an amazing
	statement that is!)  While not a "RISC" machine, the B1700 was
	optimized for emulation, and the pieces just fit together well....

What irony!  David Patterson, Mr. RISC, wrote his PhD thesis at UCLA in 1975 on
formal verification of microcode for the D-machine (as Lutz says, really the
Burroughs 1700).  Partly because of the D-machine's pleasantness, Patterson was
surprisingly successful.  But the aggravation of verifying microcode convinced
him that microcode causes more problems than it cures; he turned to the design
of machines that don't need microcode.  So that unlikely couple, formal
verification and dynamically microcodable CISC, helped spawn RISC!

bpendlet@esunix.UUCP (Bob Pendleton) (09/08/87)

in article <4782@sdcrdcf.UUCP>, eggert@sdcrdcf.UUCP (Paul Eggert) says:
-
-In article <288@tropix.UUCP> mjl@tropix.UUCP (Mike Lutz) writes
-
-	... the B1700 was a pleasure to work with at the microcode level (and
-	anyone who has done serious microprogramming knows what an amazing
-	statement that is!)  While not a "RISC" machine, the B1700 was
-	optimized for emulation, and the pieces just fit together well....
-
-What irony! David Patterson, Mr. RISC, wrote his PhD thesis at UCLA in 1975 on
-formal verification of microcode for the D-machine (as Lutz says, really the
-Burroughs 1700). Partly because of the D-machine's pleasantness, Patterson was
-surprisingly successful. But the aggravation of verifying microcode convinced
-him that microcode causes more problems than it cures; he turned to the design
-of machines that don't need microcode.  So that unlikely couple, formal
-verification and dynamically microcodable CISC, helped spawn RISC!

What a wierd coincidence! I just happen to have copies of David Pattersons
papers, "Strum: Structured Microprogram Development System for Correct 
Firmware" IEEE Transactions on Computers, Vol. @-25n No 10, October 1976, and
"An Experiment In High Level Language Microprogramming and Verification"
CACM, October 1981, Volume 24, Number 10, sitting on my desk. I was rereading
them late last week. 

I worked on the B1700 at the University of Utah in the early 1970s and have 
been telling people for several years that I thought RISC looked like
vertical microcode "done right." If what you say is true, then I was right.
Imagine that.

In five years I expect that RISC will be passe, that WISC ( wide instruction
set computers ) will be all the rage. WISC will be horizontal microcode
"done right." It will have all the advantages of RISC, but WISC machines will
run faster and cost less. We haven't abandoned microcode, we've just let it
out of the closet.

		Bob Pendleton
-- 
Bob Pendleton @ Evans & Sutherland
UUCP Address:  {decvax,ucbvax,ihnp4,allegra}!decwrl!esunix!bpendlet
Alternate:     {ihnp4,seismo}!utah-cs!utah-gr!uplherc!esunix!bpendlet
        I am solely responsible for what I say.

mjl@tropix.UUCP (Mike Lutz) (09/08/87)

In article <4782@sdcrdcf.UUCP> eggert@SM.Unisys.com (Paul Eggert) writes:
>In article <288@tropix.UUCP> mjl@tropix.UUCP (Mike Lutz) writes
>
>	... the B1700 was a pleasure to work with at the microcode level (and
>	anyone who has done serious microprogramming knows what an amazing
>	statement that is!)
>What irony!  David Patterson, Mr. RISC, wrote his PhD thesis at UCLA in 1975 on
>formal verification of microcode for the D-machine (as Lutz says, really the
>Burroughs 1700).

Just to clear up a misconception: the B1700 was *not* the D-machine.  The
two were designed and built by two different divisions in Burroughs, and,
as far as I can tell, had little influence on one another.  The D-machine
had two levels of emulation; the B1700 was a vertical microengine.  I pity
the poor soul who might have tried to nanocode a D-machine to make it into
a B1700.

However, the comments on David Patterson were right on target.  What he
demonstrated in his thesis was startling to the microprogrammming community.
Most folks were just starting to address the problems of high level languages
on horizontal machines, when Patterson showed a system that a) balanced
the vertical and horizontal resource utilization in the D-machine, b) was
verifiable, and c) in one case generated a smaller & faster emulator than
one coded by hand (the accepted norm at the time)!

sd@erc3ba.UUCP (S.Davidson) (09/11/87)

In article <475@esunix.UUCP>, bpendlet@esunix.UUCP (Bob Pendleton) writes:
> 
> In five years I expect that RISC will be passe, that WISC ( wide instruction
> set computers ) will be all the rage. WISC will be horizontal microcode
> "done right." It will have all the advantages of RISC, but WISC machines will
> run faster and cost less. We haven't abandoned microcode, we've just let it
> out of the closet.
> 
> 		Bob Pendleton
> -- 


It's happened already, though they are not all the rage yet.  They are
called Very Long Instruction Word machines, and one of the originators,
Josh Fisher, did his dissertation on global compaction of horizontal
microcode.  Josh moved to Yale after he graduated, and then moved to a
company to build a VLIW machine.  I don't know the current status of this machine,
though.  At Yale, though, Josh got some very impressive speedups from unrolling
loops and basically running compaction on them, assuming a lot of available resources.
I don't know of any results on real hardware, however.

By the way, I wouldn't say that RISCs are vertical microcode engines done right.
They just include a lot of stuff not necessary in microcode, like direct
addressing and multiplies.  It has never been that hard to generate compilers
for vertical microcode.

carl@otto.COM (Carl Shapiro) (09/13/87)

Followup-To:

Distribution:


>... Josh Fisher, did his dissertation on global compaction of horizontal
>microcode.  Josh moved to Yale after he graduated, and then moved to a
>company to build a VLIW machine.  I don't know the current status of
>this machine ...

He (and others, some who also worked on the project at Yale) has built
the machine in question.  It's called the TRACE computer, and is being
produced and sold by Multiflow Computer, Inc. in Branford, Connecticut.

bpendlet@esunix.UUCP (Bob Pendleton) (09/14/87)

in article <347@erc3ba.UUCP>, sd@erc3ba.UUCP (S.Davidson) says:
- 
- In article <475@esunix.UUCP>, bpendlet@esunix.UUCP (Bob Pendleton) writes:
-- 
-- In five years I expect that RISC will be passe, that WISC ( wide instruction
-- set computers ) will be all the rage. WISC will be horizontal microcode
-- "done right." It will have all the advantages of RISC, but WISC machines will
-- run faster and cost less. We haven't abandoned microcode, we've just let it
-- out of the closet.
-- 
-- 		Bob Pendleton
-- -- 
- 
- 
- It's happened already, though they are not all the rage yet.  They are

Read my words. Did I say that such machines did not exist? It is hard
to make predictions about things that don't exist, easy when things already
exist. In fact, such machines have existed for at least 15 years. Recent
developments in compiler technology have made them practical for
use by people with ordinary budgets. Microcoding has been very expensive.

- called Very Long Instruction Word machines, and one of the originators,
- Josh Fisher, did his dissertation on global compaction of horizontal

A good reference is "Trace Scheduling: A Technique for Global Microcode
Compatction" Joseph A. Fisher. IEEE Transactions on Computers, vol. c-30,
no. 7, July 1981.

By the by, VLIW(TM) and Trace Scheduling(TM) are trademarks of Multiflow
Computers, Inc. So I chose to use WISC instead.

- microcode.  Josh moved to Yale after he graduated, and then moved to a
- company to build a VLIW machine.  I don't know the current status of this machine,
- though.  At Yale, though, Josh got some very impressive speedups from unrolling
- loops and basically running compaction on them, assuming a lot of available resources.
- I don't know of any results on real hardware, however.

A good reference is "Bulldog: A Compiler for VLIW Architectures" John R. Ellis
MIT Press, 1986 ISBN 262-05034-X

The hardware is advertised regularly in Aviation Week & Space Technology.
Aviation trad pubs. have lots of computer related info. that some CS types 
seem to be totally unaware of.

- 
- By the way, I wouldn't say that RISCs are vertical microcode engines done right.
- They just include a lot of stuff not necessary in microcode, like direct
- addressing and multiplies.  It has never been that hard to generate compilers
- for vertical microcode.

The world YOU live in doesn't need direct addressing and multiplies in 
microcode. The world I live in requires direct addressing and multiplies,
even floating multiplies, in microcode. Out side of your own world, your
assumptions do not apply.

It has been very hard to write GOOD compilers for horizontal microcode.

Please read what I said, not what you wanted me to say. The original article
contained references to two key papers in this field. Using anecdotes and
rumors to "correct" me is as pointless as my flaming you in this reply. At
least I've provided references to cover your anecdotes.

			Bob P.
-- 
Bob Pendleton @ Evans & Sutherland
UUCP Address:  {decvax,ucbvax,ihnp4,allegra}!decwrl!esunix!bpendlet
Alternate:     {ihnp4,seismo}!utah-cs!utah-gr!uplherc!esunix!bpendlet
        I am solely responsible for what I say.

turner@uicsrd.UUCP (09/16/87)

/* Written  9:59 am  Sep 14, 1987 by fay@encore.UUCP in uicsrd:comp.arch */
> 
> Clancy et al. - Proc. Summer 1987 Usenix Conf.) which describes some of
> Multiflow's hardware and software. Truely incredible stuff, if it's for
> real.
> ....
> Normally "conditional jumps occur every five to eight instructions",
> making parallelization very difficult. So simply take a trace of
> normal program execution and have the compiler assume it will USUALLY
> execute that trace.
> ....
> Then compile the new program as if it were not going to take the
> seldom-used branches and plunge ahead.
> ....
> 
> My question to those parllel machine compiler writers out there: is anyone
> writing compilers for non VLIW machines using the same methods? Why can't,
> say, an Alliant-type (or Cedar-type, etc.) machine with hardware lock-step
> between computational elements get a trace execution, recompile assuming
> no branches, and when the 1000th instruction diverts from the "chosen
> path", just back up the CE's and undo the damage?
         ^^^^^^^^^^^^^^^^^^^
> 
> 			peter fay
> 			fay@multimax.arpa
> 
/* End of text from uicsrd:comp.arch */

The problem is one of dynamic vs. static allocation of operations to
functional units.  In a VLIW machine the compiler allocates operations
to functional units at COMPILE time.  The compiler knows which
operations should be undone when an unexpected (unpredicted?)
branch occurs.  In any machine that dynamically allocates iterations
of a loop to CE's it is VERY difficult to determine what operations
must be undone, since an early iteration could branch out of the loop
after some number of other iterations have finished.

Notice that vector operations within a CE are staticly allocated to
the sections of pipe.  So that vector operations could have
conditional branches by allowing 'back up'.  Unless the vector register
length is verry long I have doubts as to the effectiveness of this
however.
---------------------------------------------------------------------------
 Steve Turner (on the Si prairie  - UIUC CSRD)

UUCP:	 {ihnp4,seismo,pur-ee,convex}!uiucdcs!uicsrd!turner
ARPANET: turner%uicsrd@a.cs.uiuc.edu           
CSNET:	 turner%uicsrd@uiuc.csnet            *-))    Mutants for
BITNET:	 turner@uicsrd.csrd.uiuc.edu                Nuclear Power  (-%

eugene@pioneer.arpa (Eugene Miya N.) (09/16/87)

In article <478@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes:
>-- In five years I expect that RISC will be passe, that WISC ( wide instruction
>-- set computers ) will be all the rage. WISC will be horizontal microcode
>- It's happened already, though they are not all the rage yet.  They are
>A good reference is "Trace Scheduling: A Technique for Global Microcode
>Compatction" Joseph A. Fisher. IEEE Transactions on Computers, vol. c-30,
>no. 7, July 1981.
>
>By the by, VLIW(TM) and Trace Scheduling(TM) are trademarks of Multiflow
>Computers, Inc. So I chose to use WISC instead.

Added note: the lastest copy of Computer has yet another CISC/RISC
debate (this time including Mike Flynn).  I don't think RISCs will
disappear, they will become passe as a design fad.  I expect more
specialized RISCs, tuned toward specific applications like signal
processing but more general purpose than systolic arrays.  You won't see
debate in this group in favor of CISC because too few software people
read the group, and I don't mean programmers, I mean the kinds of people
pushing tagged architectures, etc.  The group lacks balance.  You will
know when CISC is dead when the 370 & the VAX disappear (I have this
bridge...).  There isn't enough experience with ELI/VLIW/WISC and too
few people (like count on one hand) know how to code these types of
machines to make me confident this is the next fad.  The Trace (machine)
that I saw at least didn't crash (running U*x).

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

johnl@ima.ISC.COM (John R. Levine) (09/16/87)

In article <347@erc3ba.UUCP> sd@erc3ba.UUCP (S.Davidson) writes:
>[Horizontal microcode RISC machines have]
>happened already, though they are not all the rage yet.  They are
>called Very Long Instruction Word machines, and one of the originators,
>Josh Fisher, did his dissertation on global compaction of horizontal
>microcode.  Josh moved to Yale after he graduated, and then moved to a
>company to build a VLIW machine.  ...

Josh's company, Multiflow Computer, is shipping their smallest minisuper,
the Trace 7/200.  It runs real fast, e.g. LINPACK 6.0 mflops compared to,
say, an IBM 3090-200's 6.8 mflops, which is not bad for a machine that
costs $300K.  According to people I know there, it turned out to run faster
than they projected it to, and in some customer benchmarks outperformed a
Cray X/MP.

The Trace 7 has a 256 bit instruction word, they're working on 512 bit and
1024 bit versions.  Unlike most other minisupers, there is no vector
processing hardware.  It executes one enormous instruction at a time, and
there is considerable compiler cleverness involved in getting as much useful
work as possible done in a single enormous instruction.  Not using vectors
means that existing cruddy Fortran code can be compiled effectively without
having to rework it to make it more easily parallelizable.

[Disclaimer:  No connection to Multiflow except that I know a lot of the
people who work there.]
-- 
John R. Levine, Cambridge MA, +1 617 492 3869
{ ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.something
The Iran-Contra affair:  None of this would have happened if Ronald Reagan
were still alive.

bcase@apple.UUCP (09/17/87)

In article <2785@ames.arpa> eugene@pioneer.UUCP (Eugene Miya N.) writes:
>Added note: the lastest copy of Computer has yet another CISC/RISC
>debate (this time including Mike Flynn).

Sigh.

>I don't think RISCs will
>disappear, they will become passe as a design fad.  I expect more
>specialized RISCs, tuned toward specific applications like signal
>processing but more general purpose than systolic arrays.  You won't see
>debate in this group in favor of CISC because too few software people
>read the group, and I don't mean programmers, I mean the kinds of people
>pushing tagged architectures, etc.

RISC will become passe as a design fad as soon as something comes along
to replace the compiler.  In other words, there the "passeness" of RISC
is nowhere in sight.  I agree that specialized RISCs will appear, but
that has nothing to do with RISC being a fad; rather it has to do with
the need for specialized processors and the ease with which RISC processors
can be specialized.  Arguements in favor of tagged architectures aren't
necessarily arguements against RISC.  See SOAR, SPUR, and SPARC.

>The group lacks balance.  You will
>know when CISC is dead when the 370 & the VAX disappear (I have this
>bridge...).  There isn't enough experience with ELI/VLIW/WISC and too
>few people (like count on one hand) know how to code these types of
>machines to make me confident this is the next fad.

You are right, there isn't enough experience, but that is true of every
new thing when it is new!  Compilers will (should) be the "people"
coding these machines.  Personally, I consider VLIW one of the very
few truly new ideas to come along (and in some sense, it isn't really
new).

>The Trace (machine) that I saw at least didn't crash (running U*x).

See?  It's a great machine!  :-)

andy@rocky.UUCP (09/18/87)

In article <6266@apple.UUCP> bcase@apple.UUCP (Brian Case) writes:
>In article <2785@ames.arpa> eugene@pioneer.UUCP (Eugene Miya N.) writes:
>>Added note: the lastest copy of Computer has yet another CISC/RISC
>>debate (this time including Mike Flynn).

>Sigh.

>RISC will become passe as a design fad as soon as something comes along
>to replace the compiler.  In other words, there the "passeness" of RISC
>is nowhere in sight.

Flynn used the same compiler/optimizer with different final code generators
to study a number of different architectures.  (The compiler and optimizer
were written under John Hennessy's direction a few years ago.  Yes, that
Hennessy.)  All of the architectures had the same ALU; they differed in
instruction format and register set architecture.  (They compared different
register window schemes with monolithic register sets of various sizes.)
Since all of the tests used the same compiler and optimizer, much of the
remaining differences were due to differences between the architectures.

One result was that more compact instruction formats were more effective
at reducing instruction traffic than expanding the instruction cache.
``[The 360-like CISC] achieves the same memory performance as [the RISC
architecture], but uses an instruction cache of only half the [RISC]
cache size.''  Flynn, et al argue that this decoding hardware is smaller
than the I-cache necessary for equivalent RISC performance.  Remember,
the critical path in MIPS, MIPS-X, and the Berkeley RISC processors is
not in the control logic; I don't know about MIPS Co's product.

``From data traffic considerations, it seems that the [360-like CISC]
with a register set of about size 16 plus a small data cache is preferable
to multiple register sets for most area combinations.''

Maybe instruction bandwidth isn't important, but data bandwidth seems
to be.  As Flynn and company conclude, ``@i[Balanced optimization] is
the key to overall instruction set efficiency.''  Let's see some data
from RISC folks.

-andy

ps - The article is in the September 87 issue of IEEE Computer.
-- 
Andy Freeman
UUCP:  {arpa gateways, decwrl, sun, hplabs, rutgers}!sushi.stanford.edu!andy
ARPA:  andy@sushi.stanford.edu
(415) 329-1718/723-3088 home/cubicle

sd@erc3ba.UUCP (S.Davidson) (09/18/87)

> 
> A good reference is "Trace Scheduling: A Technique for Global Microcode
> Compatction" Joseph A. Fisher. IEEE Transactions on Computers, vol. c-30,
> no. 7, July 1981.
Right after our paper on microcode compaction techniques, in which we
had the pleasure of killing off a research area, by showing that the
problem was solved.  We didn't really invent any of the compaction techniques
in that paper, by the way, but implemented and compared the popular ones.
The reference is "Some Experiments in Local Microcode Compaction for Horizontal
Machines," by S. Davidson, D. Landskov, B. D. Shriver, and P. W. Mallett.,
reprinted in "Advances in Microprogramming" ed. by Mallach and Sondak, 
Artech House, 1983, (second ed.)  A better article (but without the
results) is "Local Microcode Compaction Techniques," Computing Surveys,
Sept. 1980, with David Landskov as first author.  Unfortunately I haven't
seen any complete solutions for the global compaction problem.  Josh's ideas
are still the best, I think, but the jury is still out.
> 
> By the by, VLIW(TM) and Trace Scheduling(TM) are trademarks of Multiflow
> Computers, Inc. So I chose to use WISC instead.
> 
I wonder if Trace Scheduling as a trademark would stand up in court.  It was
used as a description of a particular algorithm in several papers before
this company existed.  Wide Instruction _Set_ Computer doesn't seem right,
a reduced instruction set makes sense (the set is reduced, not the instructions)
but what is a wide instruction set.  How about wide instruction computer?  I'm
not sure that makes any more sense, though.  VLIW seems the best description,
too bad they grabbed the term.
> - microcode.  Josh moved to Yale after he graduated, and then moved to a
> - company to build a VLIW machine.  I don't know the current status of this machine,
> - though.  At Yale, though, Josh got some very impressive speedups from unrolling
> - loops and basically running compaction on them, assuming a lot of available resources.
> - I don't know of any results on real hardware, however.
> 
> A good reference is "Bulldog: A Compiler for VLIW Architectures" John R. Ellis
> MIT Press, 1986 ISBN 262-05034-X
> 
The reference to their simulated results is "Using an Oracle to Measure 
Potential Parallelism in Single Instruction Stream Programs," by Alex Nicolau
and Josh, 14th Annual Microprogramming Conference, pp. 171 - 182.
Alex was Josh's student, he is now a professor at Cornell, (or
was 2 years ago when he wrote a paper form Micro 18).

> - 
> - By the way, I wouldn't say that RISCs are vertical microcode engines done right.
> - They just include a lot of stuff not necessary in microcode, like direct
> - addressing and multiplies.  It has never been that hard to generate compilers
> - for vertical microcode.
> 
> The world YOU live in doesn't need direct addressing and multiplies in 
> microcode. The world I live in requires direct addressing and multiplies,
> even floating multiplies, in microcode. Out side of your own world, your
> assumptions do not apply.

You might be interested in reading "The Cultures of Microprogramming" by
Nick Tredennick, Micro 15, pp. 79 - 83.  Sheraga and Gieser have done some
very nice work on compilers for microcode with floating point and all that
stuff, (one paper is in Micro14, others have been in IEEE Trans. Comput. or
Software Eng, I forget which.  The issue really is however, how are RISC
machines done "right" in comparison to vertical microcode engines, 
considering the difference between a microcode engine and a computer.
> 
> It has been very hard to write GOOD compilers for horizontal microcode.
> 
I think I know that, having written one (a compiler, not necessarily a good
one.)  It is very hard to write even bad compilers for horizontal microcode.
Your audience for the compiler makes a big difference too.  See 
my article "Progress in High Level Microprogramming," in the July 1986 
IEEE Software.  Not enough references there, Bruce made me take most of them
out.  There is a more extended article on high level microprogramming languages
in the book "Microprogramming Handbook," ed. by Stan Habib, out Real 
Soon Now.
> Please read what I said, not what you wanted me to say. The original article
> contained references to two key papers in this field. Using anecdotes and
> rumors to "correct" me is as pointless as my flaming you in this reply. At
> least I've provided references to cover your anecdotes.
> 
Never meant to correct you, since there was nothing to correct.  I'm not sure
all the readers of this group are up on WICs or whatever.  

By the way, what is a reference for a 15 year old WIC?  I mean one like the
Multiflow, Bulldog machine, not a big heterogeneous horizontal word.
> 			Bob P.
> -- 
> Bob Pendleton @ Evans & Sutherland
> UUCP Address:  {decvax,ucbvax,ihnp4,allegra}!decwrl!esunix!bpendlet
> Alternate:     {ihnp4,seismo}!utah-cs!utah-gr!uplherc!esunix!bpendlet
>         I am solely responsible for what I say.

Scott Davidson
{ihnp4,allegra}!erc3ba!sd

bcase@apple.UUCP (09/18/87)

In article <600@rocky.STANFORD.EDU> andy@rocky.UUCP (Andy Freeman) writes:
>Flynn used the same compiler/optimizer with different final code generators
>to study a number of different architectures.  (The compiler and optimizer
>were written under John Hennessy's direction a few years ago.  Yes, that
>Hennessy.)

John Hennessey certainly knows up from down when it comes to compilers.
But, in my humble opinion, many really important optimizations happen
*after* code generation; this is especially true for RISCs, I believe.
Looking at the output of modern, commercial "optimizing" compilers, I
am appalled at the code quality for certain cases.  Just because a text
book says that optimization occurs before code generation doesn't mean
that's the best way.

> All of the architectures had the same ALU; they differed in
>instruction format and register set architecture.  (They compared different
>register window schemes with monolithic register sets of various sizes.)
**>Since all of the tests used the same compiler and optimizer, much of the**
**>remaining differences were due to differences between the architectures.**

This is the claim I don't believe, not even a little bit.

>Remember,
>the critical path in MIPS, MIPS-X, and the Berkeley RISC processors is
>not in the control logic; I don't know about MIPS Co's product.

I'm not so sure that I believe this statement.  It is true that, in most
of the cases listed, little *area* was spent, but, at least for the
original Stanford MIPS, the master pipeline controller was a real
problem.  Remember, whether or not to "complexify" instructions set
definition is driven (or should be) by what software (compiler, OS)
wants/can deal with, not *only* by what hardware can stand.  The fact
that I can maintain cycle time even if I "complexify" the instruction
set does not mean it is the right thing to do!  What if the compiler
never emits those complex instructions?

>``From data traffic considerations, it seems that the [360-like CISC]
>with a register set of about size 16 plus a small data cache is preferable
>to multiple register sets for most area combinations.''

Again, I question the compiler effort here.

>Maybe instruction bandwidth isn't important, but data bandwidth seems
>to be.  As Flynn and company conclude, ``@i[Balanced optimization] is
>the key to overall instruction set efficiency.''  Let's see some data
>from RISC folks.

Bandwidth is not the only consideration:  LATENCY is often more important
where loads/stores are concerned (at least in machines, like RISC II, Am29000,
and I suspect SPARC that have a relatively low percentage of loads/stores).
High instruction bandwidth is very important for RISC machines; latency
is also important but there are techniques for dealing with this so that
it won't be so apparent at the chip boundary.  Techniques like interleaving
and using burst-mode memories (VDRAMS, SCDRAM, nibble-mode, etc.) can deal
with sequential bandwidth, but if it takes 2 milliseconds to get the first
word, who cares?  Latency, latency, latency.  Thus, arguements against RISC
founded on bandwidth requirements directed me, at least, will fall on deaf
ears.

About the only real data that I can offer is that the percentage of
loads/stores for stack-cache machines (RISC II, SPARC, Am29000, etc) is
often about 1/2 that observed in machines with only flat register files
(MIPS, etc.).

martin@felix.UUCP (Martin McKendry) (09/18/87)

In article <6266@apple.UUCP> bcase@apple.UUCP (Brian Case) writes:
>In article <2785@ames.arpa> eugene@pioneer.UUCP (Eugene Miya N.) writes:
>>Added note: the lastest copy of Computer has yet another CISC/RISC
>>debate (this time including Mike Flynn).
>
>Sigh.

One of the vocal opponents of RISC (on whatever grounds) is/was Doug
Jensen, of CMU.  I have just heard that he is joining a startup
called "Kendell Square Research", who are building new hardware
and software for atomic laser blasters, or real-time control,
or some such.  Now an interesting thing to know would be what
the hardware is to look like.  Anyone know?




--
Martin S. McKendry;    FileNet Corp;	{hplabs,trwrb}!felix!martin
Strictly my opinion; all of it

martin@felix.UUCP (Martin McKendry) (09/18/87)

In article <6266@apple.UUCP> bcase@apple.UUCP (Brian Case) writes:
>In article <2785@ames.arpa> eugene@pioneer.UUCP (Eugene Miya N.) writes:
>>Added note: the lastest copy of Computer has yet another CISC/RISC
>>debate (this time including Mike Flynn).
>
>Sigh.

One of the vocal opponents of RISC (on whatever grounds) is/was Doug
Jensen, of CMU.  I have just heard that he is joining a startup
called "Kendell Square Research", who are building new hardware
and software for atomic laser blasters, or real-time control,
or some such.  Now an interesting thing to know would be what
the hardware is to look like.  Anyone know?  Kendell Square Research
is in Boston, 'behind MIT'.



--
Martin S. McKendry;    FileNet Corp;	{hplabs,trwrb}!felix!martin
Strictly my opinion; all of it

earl@mips.UUCP (Earl Killian) (09/19/87)

In article <6281@apple.UUCP>, bcase@apple.UUCP (Brian Case) writes:
> About the only real data that I can offer is that the percentage of
> loads/stores for stack-cache machines (RISC II, SPARC, Am29000, etc) is
> often about 1/2 that observed in machines with only flat register files
> (MIPS, etc.).

If the percentage for those machines is really half of the MIPS-style
RISC machines, I suspect it is because either those machines have
compilers that generate unnecessary non-load/store instructions, or
the architecture requires extra non-load/store instructions to get the
same work done (e.g. using condition codes in RISC II, SPARC, and
address arithmetic in Am29000, etc.).  We really need a unit of real
work for integer programs, like the flop for fp programs.  Then we
could measure load/stores per workunit.

The data that I have below suggests that for the MIPSco
architecture/compiler the savings varies widely, but never gets to
50%.  This data is basically the % of load/stores that are due to
register save/restore.  Register windows would eliminate some, but not
all of these (load/stores for window overflow/underflow should be
factored in).  So these are an upperbound on the savings.  (That is
savings of load/stores, not of cycles).

espresso	 0.6%
spice		 4.0%
wolf		 5.6%
yacc		10%
diff		12%
compress	12%
uopt		18%
nroff		28%
ccom		38%

P.S. I'm actually a fan of register windows, even though the MIPSco
architecture doesn't have them.  However, I think some of the common
wisdom about register windows is wrong (e.g. how many load/stores they
save) and overstates their usefulness.  This is because the early work
was done without the benefit of optimizing compilers.  Too bad they
didn't; with an optimizing compiler they would have found only half as
many physical registers (i.e. silicon) are necessary to get the same
performance.  The SPARC folks, who do have a good compiler, discovered
this too (but too late to change the architecture to take advantage of
it).

The worst thing about register windows is that they are sometimes used
to justify multi-cycle load/store.  For a program like spice, you save
at most 4% and pay 32% (the % of remaining load/stores in spice) for
every extra cycle added to load/store.  Yuck.

aglew@ccvaxa.UUCP (09/20/87)

..> Talking about Flynn's article in Computer, CISC vs. RISC,
..> somebody quoted the conclusion that ``A 360-like CISC
..> with 16 registers and a moderate sized data cache''
..> may be the way to go. (paraphrased).

By the way, I am not so sure that you should call a 360-like instruction
set "CISC". The IBM 360 is actually quite a simple machine: a limited
number of instruction formats, fairly regular register use (too few
registers), a limited number of addressing modes...

DON'T FLAME ME, PLEASE!!!! I know all too well the CISCy aspects of the 360:
translate instructions, block moves, and so on - but just want to point out
that, if you subtract a few things, the 360 doesn't look too bad in a RISC
light.
    Of course, RISC began when the IBM 801 group took up where the 360
left off, without marketing pressure to force CISCy kluges...

Flynn is trading off register set size for memory+register->register
operations, both in the context of a simple instruction set. Note that
he is not trading off against all the complicated addressing modes that
a VAX, true CISC, has.
    It isn't written in Stone that a RISC has to be a load-store architecture.
Most are, true, but only because critical evaluation seems to fall on that
side. Flynn examines the alternative...
    I've often thought that RISC might better be described as "Reduced
Addressing Mode Machine", RAMM. In fact, I wrote a paper in an undergrad
course on "RAMM/RISC/SEISM".

After finishing my undergrad course, being all fired up with RISC, I went to
work for a minicomputer manufacturer. While waiting for a US visa, I had
them send up a processor manual. The instruction set looked a lot like a
360 - base register, etc. I wondered what I was getting into.
    But then I mapped out the instruction set and register usage patterns,
and I said to myself "Damn! This machine could run damned fast!" You see,
while it looked like an IBM 360, there weren't very many of the CISCy
features that had been forced upon Amdahl.
    In fact, our CPUs do typically run one instruction per cycle.
Sometimes more.


Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms.arpa

I always felt that disclaimers were silly and affected, but there are people
who let themselves be affected by silly things, so: my opinions are my own,
and not the opinions of my employer, or any other organisation with which I am
affiliated. I indicate my employer only so that other people may account for
any possible bias I may have towards my employer's products or systems.

ebg@clsib21.UUCP (Ed Gordon) (09/29/87)

Having never worked explicitly with the D machine, but having worked with
D machine graduates, and the Mini-D machine, I cannot claim to have explicit
knowledge of it's inner working.  But, if my knowledge does not fail me,
the D-machine, had a highly parallel, rather complicated parallel instruction
set, each made up of sub-instructions to handle the registers, alu, and
branch processing.  All of which was done in parallel for each instruction,
producing a fairly complex instruction, disregarding the simplistic nature
of the register set, which was understandable considering the nature
of the system.  The processor was essentially a microcontroller, with explicit
mechanisms for control of the bus.

I was also involved in the development of a "RISC"-like, pre-"RISC"
era processor, but I don't know what became of it.

If I understand the RISC architecture, it does not strive for parallelism,
but strives for a reduced instruction set, to simplify chip design, in order
to speed up execution times of instructions (one cycle per?).  The extent
of the effort was defined to me as an attempt to produce "assembler"
instructions, without the complications of (slow) HLL constructs,
not "firmware" instructions with explicit bus manipulation, as does the
Mini-D.  Comparing RISC's with the Mini-D, is like comparing "Apples" with
"Oranges", (or is that "Apples" with "Compaqs"?).  Do I misunderstand
the concept?

			--Ed Gordon
			--Data System Associates

"I know it's only rock and roll, and the opinions expressed are my own,
and are not necessarily those of any of the major recording studios, or
any other semi-coherent organization."