[comp.arch] Proposed architecture characterization survey form

earl@mips.COM (Earl Killian) (04/19/88)

Now that Motorola has announced the 88000, I believe all the
commercial "RISC"s are out in the open (or am I missing something?).
This list includes the MIPS R2000/R3000, the Fairchild Clipper, the
IBM RT, the HP Precision, the AMD 29000, Sun SPARC, Intel 80960, and
Motorola 88000 (speak up if I left anyone out!).

I propose that comp.arch develop a standard form for describing "RISC"
architectures and apply it to the above.  (We could include military
and research machines as well, if people so desire.)  Below I propose
such a form, which will, no doubt, require generalization.  Once we
agree on what it takes to fairly well characterize an architecture and
its implementation, we can fill in the answers for all of the above
(unless people think this is a worthless exercise?).

First some definitions of my terminology are in order, because it's
probably different from everyone else's.  The latency of an operation
is the time it takes for the entire operation to complete.  The issue
time is the time before you can start the next instruction, and the
rate is the time until you can start another instruction of the same
type.

For example, a machine might require 3 cycles for a load instruction:
1 to calculate the address, 2 to access the cache, and allow a new
load every 2 cycles, but allow a non-load to start immediately.  I
describe this load as 3/1/2.  What is commonly called the load delay
(as opposed to latency) is the time after the load before you can
reference the result.  This is the latency minus the issue time (3 - 1
= 2) in this case.  Don't confuse latency with delay.

Some latency/issue/rate examples from the Cray-1S (from memory, so
don't quote me):
	logicals:	 1/1/1
	shift:		 2/1/1
	integer add:	 3/1/1
	load:		11/2/2
An example of a multi-cycle latency, non-pipelined floating point
unit might have:
	add:		 2/1/2
	mul:		 4/1/4
I hope that is clear enough.  If not, I'll try to clarify.

Here is my proposed form to characterize architectures and their
implementations.  I'll post the MIPSco numbers once we agree on the
data to collect.

> Peak native MIPS

What is the clock cycle time?
What is the peak native MIPS rate?

> Implementation technology

What are the parameters of the implementation technology?

> Instruction format

What instruction sizes are used?
What size are immediate operands?
What size are branch displacements?

> Integer Registers

How are the registers organized [simple, windowed]?
How many total integer registers?
Hardwired zero register?

For windowed machines:
How many registers are addressed by an instruction?
How many of these are not windowed?
What window increments are supported?
Window overflow and underflow are handled in [software, hardware]?

> Integer Alu

What is the logical latency/issue/rate?
What is the shift latency/issue/rate?
What is the add latency/issue/rate?
What is the compare latency/issue/rate?

> Branches

Which operand comparisons are implemented in the conditional branch
instruction, and which require a separate instruction?

Where is the result of separate comparisons stored [registers,
condition codes]?

Which forms of branch delay are present in instruction set
[execute N if no branch, execute N if branch, execute N always]?

What are the taken and not-taken cycle counts for each branch type?

> Loads/Stores

What addressing mode(s) do load instructions use?
What addressing mode(s) do store instructions use?
Which load/store sizes are supported [8, 16, 32, 64]?
What is the load latency/issue/rate?
What is the store latency/issue/rate?

> Integer Multiply/Divide

How is multiply is implemented [software, multiply step, hardware]?
How many cycles to perform 32x32->32 multiply?
How is divide is implemented [software, divide step, hardware]?
How many cycles to perform 32x32->32 divide?

> Floating Point

Are floating point registers separate from integer registers?
How many 32-bit floating point registers?
How many 64-bit floating point registers?
How many 80-bit floating point registers?

How is floating point is implemented [software, coprocessor, on-chip]?
What are the floating point operation latency/issue/rates?

		32-bit		64-bit		80-bit
	add
	mul
	div

Which floating point units can operate in parallel?
Can floating point operate in parallel with integer?
Are floating point exceptions precise?

> Memory management

Page size?
Translation cache [none, off-chip, on-chip]?
Translation cache size in entries?
Translation cache associativity [direct-mapped, 2-set, 4-set, full]?
Translation cache miss handled by [software, hardware]?

> Caches

Instruction cache [none, off-chip, on-chip]?
Data cache [none, off-chip, on-chip]?
Are I and D caches separate?
I-cache total size in bytes?
I-cache associativity [direct-mapped, 2-set, 4-set, fully associative]?
I-cache address block size in bytes (bytes per tag)?
I-cache transfer block size in bytes (bytes read on cache miss)?
I-cache index [virtual, physical]?
I-cache tag [virtual, physical]?
D-cache total size in bytes?
D-cache associativity [direct-mapped, 2-set, 4-set, fully associative]?
D-cache writes [write-through, write-back]?
D-cache address block size in bytes (bytes per tag)?
D-cache transfer block size in bytes (bytes read on cache miss)?
D-cache index [virtual, physical]?
D-cache tag [virtual, physical]?
-- 
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086

ram%shukra@Sun.COM (Renu Raman, Taco Bell Microsystems) (04/19/88)

In article <2048@gumby.mips.COM> earl@mips.COM (Earl Killian) writes:
>Now that Motorola has announced the 88000, I believe all the
>commercial "RISC"s are out in the open (or am I missing something?).
>This list includes the MIPS R2000/R3000, the Fairchild Clipper, the
>IBM RT, the HP Precision, the AMD 29000, Sun SPARC, Intel 80960, and
>Motorola 88000 (speak up if I left anyone out!).
>
>UUCP: {ames,decwrl,prls,pyramid}!mips!earl
>USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086

Here is a brief listing of RISCs from university, commercial & Govt projects
I am sure I have missed out many more. Can somebody email me or fill in
the rest?

Universities:

   Berkeley RISC I/II, SOAR, SPUR
   Stanford MIPS-X
   Purdue (With RCA?)

Commercial

   Acorn: ARM
   AMD: 29000
   ATT: CRISP
   Fairchild: Clipper
   HP: Spectrum
   IBM: RT(ROMP) & its next generation
   Mips: R[2/3]000
   Motorola: 88000 (an example of how no. got changed before release:-))
   Pyramid: 90X
   Ridge: Ridge-32
   Sun: Sparc
   Xerox: ?? (ECL)
   Intel: 80960

Commercial(Announced and in-the-works)

   Apple: <code-named aquarius - so is it next february:-)>
   Apollo: <no. to-be-stamped :-)>
   DEC: ???

DoD & Darpa Contracted

   GE: RPM-40
   RCA: ??
   TI: ??
   MD: ??
   Rockwell: ??

---------------------
   Renukanthan Raman				ARPA:ram@sun.com
   Sun Microsystems			UUCP:{ucbvax,seismo,hplabs}!sun!ram
   M/S 18-41, 2500 Garcia Avenue,
   Mt. View,  CA 94043

csg@pyramid.pyramid.com (Carl S. Gutekunst) (04/20/88)

In article <49983@sun.uucp> ram@sun.UUCP (Renu Raman) writes:
>Commercial
>
>   Pyramid: 90X

Also Pyramid 9810, same architecture. It has been debated whether the Pyramid
architecture is "really" RISC, since it has a rather bulky instruction set,
microcode, an interlocked pipeline, and instructions you'll never find on a
CPU designed by John Hennessey (like interruptable block move). On the other
hand, it *did* borrow liberally from RISC I and MIPS-X, the most visible ele-
ments being the sliding register window for function calls, and the notion of
using smart compilers instead of smart hardware. The 90x and 9810 both use a
schottky TTL implementation, but that is strictly an implementation issue.

One other commercial RISC:

    Celerity 1200, et al

What is ironic is that Celerity's product literature has been very low-key
about its RISC architecture, though the 1200 is more "RISC" than either Ridge
or Pyramid, the two vendors who were making the most noise about RISC at the
time the 1200 was announced. The machine's floating point performance scaled
well with its integer performance, something only a few other processors have
demonstrated (e.g. the MIPS R2000).

>   Apollo: <no. to-be-stamped :-)>

The *system* has been announced as the Apollo 10000. Sun's salespeople have
been making derisive noises about it, since Apollo announced the box before
two of its eight (six? ten?) gate arrays had seen silicon.

Actually, announcing processors that have only been simulated has become
commonplace. This is almost reasonable, given the quality of simulation tools
available these days.

>   DEC: ???

DEC's big RISC engine goes by the code name "Titan." It is something around 10
MIPS, up to 10 tightly-coupled CPUs. It's supposed to be a big secret, so don't
spead this around. :-)

A mildly interesting point is the success of commercial RISC products that
have been in the marketplace for a while. Ridge and Celerity are essentially
in the past tense, although Ridge is still tying to make a go of it. Pyramid
is thriving. The IBM PC/RT was a flop. The MIPS R2000 has done fairly well,
although I've been surprised by the number of vendors jumping on top of SPARC
when the R[23]000 has so much more going for it. (Save the flames, I've read
all the debates on this.) The SPARC is too new to call, but it appears that it
will be a smashing success.

<csg>

celerity@bucasb.bu.edu (Roger B.A. Klorese) (04/20/88)

In article <49983@sun.uucp> ram@sun.UUCP (Renu Raman) writes:
!Commercial
!
!   Acorn: ARM
!   AMD: 29000
!   ATT: CRISP
    Celerity: C1200 and C1230 Accel
!   Fairchild: Clipper
!   HP: Spectrum
!   IBM: RT(ROMP) & its next generation
!   Mips: R[2/3]000
!   Motorola: 88000 (an example of how no. got changed before release:-))
!   Pyramid: 90X
!   Ridge: Ridge-32
!   Sun: Sparc
!   Xerox: ?? (ECL)
!   Intel: 80960
!
!Commercial(Announced and in-the-works)
!
!   Apple: <code-named aquarius - so is it next february:-)>
!   Apollo: <no. to-be-stamped :-)>
    Celerity: 6000
!   DEC: ???
!

dre%ember@Sun.COM (David Emberson) (04/20/88)

Of course, the number of cycles to do this or that is a function of the
implementation, not the architecture.  And none of us will supply the
really interesting data--on the chips we haven't announced yet!

Earl, if the purpose of this exercise is to prove that the R3000 will
outbench the 16 MHz Fujitsu SPARC, then on behalf of Sun Microsystems I concede
(assuming your published data to be correct--I have never seen an R3000).

How about adding to the list "total dollars being invested in new
implementations?"  And don't forget "number of engineers worldwide working
on this architecture."

Ah, this is going to be one fun war--and we all win!

			Dave Emberson (dre@sun.com)

csg@pyramid.pyramid.com (Carl S. Gutekunst) (04/20/88)

In article <20123@pyramid.pyramid.com> I wrote:
>On the other hand, it [the Pyramid 90x] *did* borrow liberally from RISC I
>and MIPS-X....

Foo. I should look where I type. I meant the original Stanford MIPS (what was
it called?), not the MIPS-X.

<csg>

celerity@bucasb.bu.edu (Roger B.A. Klorese) (04/20/88)

In article <20123@pyramid.pyramid.com> csg@pyramid.pyramid.com (Carl S. Gutekunst) writes:
>A mildly interesting point is the success of commercial RISC products that
>have been in the marketplace for a while. Ridge and Celerity are essentially
>in the past tense, although Ridge is still tying to make a go of it. 

It's funny, Carl: every time you've posted on RISC before, I've made some 
piddling correction about the Celerity information.  Now that I'm no longer
a Celerity employee (despite the borrowed account), I'm gonna do it again!

Last week, it was announced that Floating Point Systems is in the process of
acquiring Celerity's assets and liabilities, and will continue the development
of the Celerity 6000, with the remaining engineering staff, as well as picking
up support if the installed base.  So yes, Celerity the totally independent
company is in the past tense, but Celerity the FPS subsidiary is not.

---
Roger B.A. Klorese                      MIPS Computer Systems, Inc.
{ames,decwrl,prls,pyramid}!mips!rogerk  25 Burlington Mall Rd, Suite 300
rogerk@mips.COM                         Burlington, MA 01803
* Your witticism here.*                 +1 617 270-0613

ram%shukra@Sun.COM (Renu Raman, Taco Bell Microsystems) (04/20/88)

In article csg@pyramid.pyramid.com (Carl S. Gutekunst) writes:
>>
>>   Pyramid: 90X
>
>Also Pyramid 9810, same architecture. It has been debated whether the Pyramid
>architecture is "really" RISC, since it has a rather bulky instruction set,

    Exactly.  I had included "RISC" and "claimed RISC".
    I guess one of the objectives of Earl's original 
    note is to settle this thing called "what is RISC".

>>   Apollo: <no. to-be-stamped :-)>
>
>The *system* has been announced as the Apollo 10000.

    Wrongo!  That is the machine. Now that I think back, the processor goes
    by the name - PRISM.  The "no-to-be-stamped" was a smiley remark to some
    previous discussion here about when & how machines/processors get their
    marketing ids.

>A mildly interesting point is the success of commercial RISC products that
>have been in the marketplace for a while.

    lower development cost, better simulation tools, Unix and compatibility
    being a non-issue are partly responsible for RISC successes. 

    Query:  Are there any RISC(y) processors that is running an OS other
    than UNIX?

><csg>

Renu

csg@pyramid.pyramid.com (Carl S. Gutekunst) (04/20/88)

In article <577503463.14723@bucasb.bu.edu> rogerk@mips.com (Roger B.A. Klorese) writes:
>It's funny, Carl: every time you've posted on RISC before, I've made some 
>piddling correction about the Celerity information.

Deja vu? I don't think this correction was piddling, though, since I may have
unintentionally scared some Celerity users into thinking that their machines
are orphans, which is certainly not true.

>Last week, it was announced that Floating Point Systems is in the process of
>acquiring Celerity's assets and liabilities....

Yes, I knew that, and have guarded hopes that the 6000 will be a reality. I
say guarded, since we really don't know what FPS will do. (If you've ever been
in the middle of a takeover, you'll know that even what your old president is
told is suspect, let alone the grunt engineers, let alone the media.)

My point, though, was that out of the first five commercial RISC ventures,
three flopped. Now don't misunderstand; I suspect that if Celerity had the
kind of financial backing that Ridge and the PC/RT had (Ridge raised $20
Million after they had already failed once), it would have been successful.
But the road to RISC has been a rocky one. 

Can someone from Ridge comment on their health?

<csg>

butcher@G.GP.CS.CMU.EDU (Lawrence Butcher) (04/20/88)

When is a RISC not a RISC?  Today I got copies of the 80960KB Programmer's
and Hardware Designer's reference manuals.  32 AND 64 bit instructions.
Enthusiastic addressing modes.  Multiple-cycle instructions.  Confused
call/return instructions.  Decimal data type.  Trig functions in microcode
instead of manufacturer-sanctioned subroutines.  No delayed branching.
Zero-cycle branches anyway by making other instructions SO SLOW that the
branch is finished before the previous instruction is done.  Multiplexed
address/data bus.  No memory management.  No support for page faults.
Maximum instruction time 75878 clocks +- 40%. (probably typo :-)

Maybe the 8087 is a RISC?  But really Intel does not advertise this chip
as a RISC.  They have target the "embedded-processor" market.  The KB chip
doesn't suggest workstations to me.  I had hoped that this chip would help
AMD, Motorola, and MIPS revise the price of their chip sets downword.
Maybe next one, Intel?  :-)

Weitek has a family of processors called the XL-8000/XL-8032/XL-8064.  I
don't think that they are advertised as being RISC, but I think that they are.
The 3 chip set contains no memory management, but can deal with page faults.
The architecture has seperate instruction address, 64 bit instruction data,
data address, and 32 or 64 bit data busses.  At most an integer instruction,
a floating point multiply-accumulate, and a short conditional branch can be
executed each clock.  A complete cross-development system is available.  The
set comes 8 MHz, 10 MHz, and 12 MHz.  The 8 MHz part dhrystones around 6500,
I think.  It is MUCH faster at floating point than that number suggests.

Let me point out an article that might be interesting to readers.  The
Volume 16 Number 1 March 1988 issue of Computer Architecture News has an
article by Wm. A. Wulf on "The WM Computer Architecture".  Wulf has a
background in compiler-design and has a very good idea of what instruction
sequences occur in real code.  He describes a RISC instruction set with
32-bit instructions which name 3 source registers and 2 alu operations
per instruction.  He argues that the compiler can juggle ALU ops so that
the second operation frequently does useful work.  His machine transfers
instructions to the Integer ALU and Floating Point ALU thru fifo's,
condition codes from the ALUs to the IFU thru fifo's, and data to and
from memory thru fifo's.

This thing seems like a step in the direction of a RISC VLIW.  If things
like page faults were figured out, and if interrupts could happen without
causing registers to be overwritten before being used, and if delayed
branching really wasn't important as claimed, and if the ALU ops were
simple (only one ALU could multiply or divide), would this instruction
set really be 2 or more times faster than today's RISCs at the same speed
for roughly the SAME cost?  Would it be as economical for a conventional
RISC to fetch 2 instructions at the same time and execute them in parallel
if there were no data dependency??

earl@mips.COM (Earl Killian) (04/20/88)

In article <50070@sun.uucp> dre%ember@Sun.COM (David Emberson) writes:

   Of course, the number of cycles to do this or that is a function of the
   implementation, not the architecture.

Yes, that's why I continuously referred to "the architecture and its
implementation" in my posting.  I think the implementations are
actually more interesting than the instruction set architecture
underneath.  Good implementation is more difficult, and there's a lot
to be learned from such study.  The places where implementation and
instruction set design interact are especially interesting.  I've
noted numerous times in this forum when other designers said their
instruction set chose method X because Y was too hard when we made the
opposite choice, and vice versa.

   And none of us will supply the really interesting data--on the
   chips we haven't announced yet!

Of course.  But as soon as something new is announced, we'll have a
good way to communicate information, right?  I'm certainly not
suggesting that we take a snapshot of April 88 and never update it.

(Nor am I letting the fact that MIPS' unannounced designs are oodles
better than the current ones from talking about our current ones :-)

   Earl, if the purpose of this exercise is to prove that the R3000
   will outbench the 16 MHz Fujitsu SPARC, then on behalf of Sun
   Microsystems I concede (assuming your published data to be
   correct--I have never seen an R3000).

That was definitely not my intent.  My purpose was as an aid to help
me keep track of what's going on out in the wide world, because it's
getting tough with all the different machines and implementations.
The recent Moto/Intel announcements were the real spur.  I started
making a list of the features for everything I knew about, and
realized there were a lot of blanks.  I thought comp.arch would be
both helpful in filling in the blanks, and interested in the results.

   How about adding to the list "total dollars being invested in new
   implementations?"  And don't forget "number of engineers worldwide
   working on this architecture."

   Ah, this is going to be one fun war--and we all win!

One remark in the spirit of your posting: you seem to be suggesting
that you prefer comp.arch not discuss the Sun/Fujitsu SPARC
implementation because it's uncompetitive with respect to the others,
and instead you would rather wait for a worthier SPARC entrant.
If so, fine.  In the meantime would you care to comment on what data
is relevant for when you do have something to talk about?
-- 
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086

paulr@granite.dec.com (Paul Richardson) (04/20/88)

I am kind of tired of listening to these arguments of what is and what is
not a RISC machine.So that we can move onto more interesting architectural
topics I propose the follwing definition.


RISC: 
      1) A machine in which the instruction set is designed/chosen based
         on what makes the most sense to put into the hardware.For instance,
         you probably wouldn't want to piss away alot of hardware on the
         equivalent of the *VAX* POLY instruction,yet on the other hand
         optimizing load/stores might turn out to be a big performance win.

      2) Although it is not necessarily a requirement,it has been a 
         characteristic of RISC machines that they have a 'large'
	 general purpose register file.Just how many defines large
         is up to the designer but from the papers on register allocation
         I have seen (especially David Wall's) it seems that something
         between 32 and 100 is about all that current compiler technology
	 knows how to deal with

 


I have had the oppurtunity to work with one so called 'RISC' machine,
(The dec resaerch box called the TITAN) and read about many others.The
underlying similarity amongst them all seems to be that good engineering
practice was used in determing the architecture/instruction set.Doing things
like taking statisctics on what the frequency of instructioms used in REAL
programs and using that to determine what does and what does not go into the
hardware seems to make sense to me.Making compilers smarter and do things
like schedule instructions seems to make sense to me.Using registers instead
of main memory during run time makes sense to me (this is not a risc 
discovery).


OK OK OK, that's my two sense worth.Now could we talk about things like

merits of delayed branching,or limits on pipeline depth,or nifty floating
point algoritms etc...I know that there are plenty of bright people out
there just dying to spill their grey matter on other topics

/pgr

oconnor@sungoddess.steinmetz (Dennis M. O'Connor) (04/20/88)

An article by ram@sun.UUCP (Renu Raman) says:
] In article <2048@gumby.mips.COM> earl@mips.COM (Earl Killian) writes:
] Here is a brief listing of RISCs from university, commercial & Govt projects
] I am sure I have missed out many more. Can somebody email me or fill in
] the rest?

] DoD & Darpa Contracted
] 
]    GE: RPM-40

  Bulk CMOS, Silicon exists.

]    RCA: ??

  RCA did TWO RISC designs for DARPA, one GaAs, one CMOS SOI : 
    One was called "GaAs Microprocessor"
    The other was called "High-Speed CMOS Microprocessor"

  Unfortunately, neither design was funded for production.
  BTW, GE and RCA are one company now. That happened about
  halfway through the RCA micro's design period. No silicon.

]    TI: ??

  TI and CDC ( Control Data ) teamed up to do, for DARPA :
    "High-Speed GaAs Microprocessor", which is still in development I think.
    Recently anounced fab of a below-target-speed version, I think.

]    MD: ??

  McDonnell-Douglas Astronautics Company did the MD 484, in GaAs
  Still in development, I think. 

]    Rockwell: ??

  I don't know anything about the Rockwell effort. Is it DARPA ?

  You left out Sperry. Sperry ( now part of Unisys ) did the
  "High Speed CMOS Microprocessor", also for DARPA. Silicon exists.

]    Renukanthan Raman				ARPA:ram@sun.com

The GE RPM-40 used to be called "High Speed CMOS Microprocessor" too,
among other things. We decided it took too long to say, and was
eating space on our viewgraphs. So RPM-40 ( RISC Pipelined
Microprocessor, 40MIPS ) was born. 

Check the Goverment Printing Office for reports on these efforts.
They are not classified, but are ITARS restricted, I think.
--
 Dennis O'Connor   oconnor%sungod@steinmetz.UUCP  ARPA: OCONNORDM@ge-crd.arpa
        ( I wish I could be polite all the time, like Eugene Miya )
  (-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)

baum@apple.UUCP (Allen J. Baum) (04/20/88)

--------
[]
>In article <1468@pt.cs.cmu.edu> butcher@G.GP.CS.CMU.EDU (Lawrence Butcher) writes:
>When is a RISC not a RISC?  Today I got copies of the 80960KB Programmer's
>and Hardware Designer's reference manuals.
> 32 AND 64 bit instructions.
> Enthusiastic addressing modes.
	Well, I might agree with you there. They are not as bad as, say,
        Clipper, but I'm not sure I'd use that as an argument why something
        was or wasn't RISC.
> Multiple-cycle instructions.
        You mean like HP Spectrum, or IBM RT/PC? Not much of an argument?
> Confused call/return instructions.
        I'm not sure that calling the variations on call/return 'confused'
        is terribly technical. They allow choices of their full 'call', with
        argument passing, etc., when they need it, and an optimised version
        for when you don't. They are covering all their bases
> Decimal data type.
        Their decimal instructions will add or subtract a single decimal
        digit. This doesn't seem to be a horrendous amount of support.
        HP Spectrum has decimal support as well.
> Trig functions in microcode instead of manufacturer-sanctioned subroutines.
        I'm not sure if the microcode gives some advantages over using the
        built-in floating point add/sub/mul/div, but I seem to recall that
        they are both faster and more accurate. Intel has had Prof. Kahan
        (of IEEE standards fame) on their consulting list since weill before
        the IEEE standard. If they believe that high performance, accurate
	trig routines are important for their embedded market, I'd say that
	this was probably a good choice. Besides, they can always trap on the
	opcodes if they don't want to implement them.
> No delayed branching.
	You mean like Ridge or CRISP or Clipper? Based on conversations I've
	had with Ridge and CRISP people, I'm now mostly satisfied that a
	software branch prediction bit can perform about as well as delayed
	branching, depending on the success rate of filling branch holes.
	If you can fill holes better than you can predict, then delayed
	branching is better. Papers I've seen from Berkeley show an 80%
	correct branch prediction rate can be achieved. Its not clear that
	you can maintain 80% of branch holes being filled, especially if
	you also have to fill load holes.
>Zero-cycle branches anyway by making other instructions SO SLOW that the
>branch is finished before the previous instruction is done.
	Like SPARC? This is implementatino, not architecture. You can be sure
	 that the next implementation won't be so (dare I say it?) wimpy.
> Multiplexed address/data bus.
	What does this have to do with RISC? You may as well complain about
	using the address bus twice a cycle, ala MIPS.
> No memory management.  No support for page faults.
	The -MC models have all the memory management you would want. An
	embedded controller is not so dependent on an MMU, so they left it
	out of SOME versions of the chip.
>Maximum instruction time 75878 clocks +- 40%. (probably typo :-)
	Maybe not a typo. It's for the Remainder Real instruction. The trig
	and log functions take 104-441 cycles otherwise, and are interruptible.


>Let me point out an article that might be interesting to readers.  The
>Volume 16 Number 1 March 1988 issue of Computer Architecture News has an
>article by Wm. A. Wulf on "The WM Computer Architecture".  Wulf has a
>background in compiler-design and has a very good idea of what instruction
>sequences occur in real code.  He describes a RISC instruction set with
>32-bit instructions which name 3 source registers and 2 alu operations
>per instruction.  He argues that the compiler can juggle ALU ops so that
>the second operation frequently does useful work.

	This sounds a lot like the original Stanford MIPS. They gave
it up as a bad idea. I'll look for the article, though. 

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

fotland@hpihoah.HP.COM (Dave Fotland) (04/20/88)

Many of the items on your form are implementation
rather than architecture, so you will need a separate
entry for each implementation.  Maybe you could
collect the architecture stuff at the front of the
form so we wouldn't have to repeat it.  For example
HP Precision architecture has several implementations:

Model		CPU	FPU	Cache	Bus
HP9000/825	1	1	1	1	
HP9000/835	1+	2	2	1
HP9000/840	2	3	3	1
HP9000/850	1	1 or 2	4	2	
HP9000/855	3	2	5	2

(And the equivalent HP3000 machines.  The difference is
HP9000 runs HP-UX and HP3000 run MPE).

There are 3 completely different CPU's (one with two versions),
3 different floating point coprocessors, 5 different caches
(different size and/or organization), and two different
busses (with 3 different memory systems).

-David Fotland

fotland@hpda.HP.COM

garyb@hpmwtla.HP.COM (Gary Bringhurst) (04/21/88)

Why has no one mentioned the Inmos Transputers?  They are certainly Risc'ish.

Gary L. Bringhurst
Hewlett-Packard Company

allen@granite.dec.com (Allen Akin) (04/21/88)

In article <20123@pyramid.pyramid.com> csg@pyramid.pyramid.com (Carl S. Gutekunst) writes:
>
>DEC's big RISC engine goes by the code name "Titan." It is something around 10
>MIPS, up to 10 tightly-coupled CPUs. It's supposed to be a big secret, so don't
>spead this around. :-)
>
><csg>

Just to clarify things for the masses:

Titan is a research RISC machine designed and implemented several years
ago by DEC's Western Research Lab in Palo Alto.  It's been mentioned in
a number of papers published by WRL (see Wall and Powell's paper in
ASPLOS II, for example) so feel free to spread it around. :-)

Allen

walter@garth.UUCP (Walter Bays) (04/21/88)

In article <49983@sun.uucp> ram@sun.UUCP (Renu Raman) writes:
>Here is a brief listing of RISCs from university, commercial & Govt projects
>I am sure I have missed out many more. Can somebody email me or fill in
>the rest?
>   ...
>   Fairchild: Clipper
    Intergraph: Clipper C100, C300

Intergraph bought the Fairchild Advanced Processor Division which makes
the Clipper.  National Semiconductor owns the rest of Fairchild.
-- 
------------------------------------------------------------------------------
Any similarities between my opinions and those of the
person who signs my paychecks is purely coincidental.
E-Mail route: ...!pyramid!garth!walter
USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 94303
Phone: (415) 852-2384
------------------------------------------------------------------------------

phil@osiris.UUCP (Philip Kos) (04/21/88)

Earl (and everyone who has responded so far) -

Great idea.  I have a suggestion about the terminology, though.  It's
probably too late to do anything about this, but what the hell...

In article <2048@gumby.mips.COM>, earl@mips.COM (Earl Killian) writes:
> First some definitions of my terminology are in order.... the rate is
> the time until you can start another instruction of the same
> type.

While I don't have a really big problem with this one, the common (to
me anyway) definition of "rate" is not a unit of time, but a measure of
something else reduced to standard units of time.  For instance, "one
SP floating add instruction can be started every cycle" is a "rate",
but "one cycle" is not.  What Earl suggests is actually the reciprocal
of my understood meaning, and seems to me something like using "Hz" to
indicate the period of a wave instead of its frequency.

It doesn't really make any difference to the discussion as long as
everyone understands what is meant by the term "rate".  However, anyone
walking in on the middle of a discussion where the meaning of "rate" is
assumed as above by all the participants is likely to get REALLY
confused if he assumes the "traditional" meaning.

Suggestions for better terms (or roots, anyway): "delay", "lag",
"hold", "period", etc. (you get the picture), probably qualified as
"type-interlock delay" or something to differentiate it from the basic
pipeline issue delay or whatever.  (I think that this differentiation
is probably what Earl was going for when he proposed the term "rate"
anyway.)

Of course, since I'm not contributing anything substantial to the
discussion, you should all ignore me anyway... :-)

                                                               Phil Kos
...!decvax!decuac!\                                 Information Systems
  ...!uunet!mimsy!aplcen!osiris!phil         The Johns Hopkins Hospital
...!allegra!/                                             Baltimore, MD

dre%ember@Sun.COM (David Emberson) (04/21/88)

Earl, I would like to apologize for being cynical about your intent.  On
further reflection, I think there is some value to this cataloguing of
RISCS--if only, as you say, to communicate information.  We are all in
search of the Holy Instruction Set which solves all our problems of bit
efficiency, ease of implementation, etc. so such a list may actually inspire
someone to insight on the problem of architecture comparisons.

One thing which is missing from the list (which might fall under the category
of "parameters of implementation technology") and which is the thing which
does make the Fujitsu SPARC competitive (if you will forgive my flirting with
delivery of a commercial message for the moment) is price.  As such, I think
the component compares favorably when price-performance rather than performance
alone is considered.  And no, I do not wish to limit anyone's discussion of
SPARC or anything else.  It would certainly be nice, though, if we could talk
about "the latest stuff."  Some of you MIPS guys are very dear friends of mine
and it would be nice to compare notes--although it would not surprise me if
you knew in detail what was going on here anyway!  I had an interviewee the
other day describe one of my most secret activities in great detail!  It seems
he had a previous interview at one of our "technology partners."  Ah, the
joys of the free enterprise system...

I vaguely remember hearing someone about ten years ago give a talk on the
subject of architecture comparison.  Unfortunately I do not remember who it
was, but they defined two measures of an architecture, R and S.  The R measure
was a metric of the number of register references and the S measure was a
metric of the number of memory references (storage) in a given piece of code.
Presumably a high R/S is desirable, although this is far from certain in the
presence of write-back caches.  In any case, these indicators are independent
of implementation technology.  It would be nice if we could develop some such
set of metrics which would allow architectures to be compared for efficiency
and ease of implementation.  I haven't a clue as to how to measure something
for ease of implementation--I'll leave that to some enterprising person.  I am
in full agreement with your statement that implementations are the most
interesting.  We can get real numbers from them and identify real areas for
improvement.

On another subject, does anyone know how the 88200 cache consistency scheme
works?

			Dave Emberson
			(dre@sun.com)

root@mfci.UUCP (SuperUser) (04/21/88)

In article <1468@pt.cs.cmu.edu> butcher@G.GP.CS.CMU.EDU (Lawrence Butcher) writes:
>When is a RISC not a RISC?  Today I got copies of the 80960KB Programmer's
>and Hardware Designer's reference manuals.  32 AND 64 bit instructions.
>Enthusiastic addressing modes.  Multiple-cycle instructions.  Confused
>call/return instructions.  Decimal data type.  Trig functions in microcode
>instead of manufacturer-sanctioned subroutines.  No delayed branching.
>Zero-cycle branches anyway by making other instructions SO SLOW that the
>branch is finished before the previous instruction is done.  Multiplexed
>address/data bus.  No memory management.  No support for page faults.
>Maximum instruction time 75878 clocks +- 40%. (probably typo :-)

I'll skip this question in the hopes of keeping whatever friends I
still have at Intel...:-)

>This thing seems like a step in the direction of a RISC VLIW.  If things
>like page faults were figured out, and if interrupts could happen without
>causing registers to be overwritten before being used, and if delayed
>branching really wasn't important as claimed, and if the ALU ops were
>simple (only one ALU could multiply or divide), would this instruction
>set really be 2 or more times faster than today's RISCs at the same speed
>for roughly the SAME cost?  Would it be as economical for a conventional
>RISC to fetch 2 instructions at the same time and execute them in parallel
>if there were no data dependency??

I had a fairly sarcastic reply all typed in, but I'll spare
you...Multiflow's TRACE does all of the above, and I think I can make
a much stronger claim for its being a RISC than some of the other
machines listed in the other thread of discussion currently going on
in this newgroup:  load/store, no microcode, simple instructions,
delayed branches, and the ultimate in moving runtime functionality to
compile-time (trace-scheduling!).  We could have an interesting
discussion on what it would take to realize a similar VLIW on a chip,
though -- you need pretty high interconnectivity, and a very wide
instruction cache to tell all the functional units what to do.  

Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090

root@mfci.UUCP (SuperUser) (04/21/88)

In article <50217@sun.uucp> dre%ember@Sun.COM (David Emberson) writes:
>
>
>I vaguely remember hearing someone about ten years ago give a talk on the
>subject of architecture comparison.  Unfortunately I do not remember who it
>was, but they defined two measures of an architecture, R and S.  The R measure
>was a metric of the number of register references and the S measure was a
>metric of the number of memory references (storage) in a given piece of code.
>Presumably a high R/S is desirable, although this is far from certain in the
>presence of write-back caches.  In any case, these indicators are independent
>of implementation technology.  It would be nice if we could develop some such
>set of metrics which would allow architectures to be compared for efficiency
>and ease of implementation.  I haven't a clue as to how to measure something
>for ease of implementation--I'll leave that to some enterprising person.  I am
>in full agreement with your statement that implementations are the most
>interesting.  We can get real numbers from them and identify real areas for
>improvement.
>			Dave Emberson
>			(dre@sun.com)

I bet you're remembering the Military Computer Family work of the
mid-to-late '70s done at Carnegie-Mellon.  R was the "canonical
processor cycles" for a benchmark; S was the program size, and M was
the memory bus traffic.  In our "Computers, Complexity, and
Controversy" paper in Computer magazine Sept. 1985 we applied this
evaluation method to Berkeley's RISC-II, mostly as an intellectual
exercise, but partly to show that the field had already outgrown this
kind of approach to architectural evaluation.  My feeling was that
the fundamental problem was that MCF was extremely careful to
separate implementation from architecture, and RISC is quite willing
to mix the two freely (trading object code compatibility across
products in a company's product line (Sun-4/Sun-3) for the added
performance available when you can max-out a given set of
implementation constraints.

It's probably easier to gauge "difficult-of-implementation" than
"ease"; if, in a blindfold test, you gave me the VAX instruction set
and RISC-I's, I'd have no problem picking the one I'd find easier to
implement, and I'd have a list of reasons why.  But of course, then
you'd want to quantify how much easier, and that's a good question.

Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090

paulr@granite.dec.com (Paul Richardson) (04/21/88)

In article <221@granite.dec.com> allen@decwrl.dec.com (Allen Akin) writes:
>In article <20123@pyramid.pyramid.com> csg@pyramid.pyramid.com (Carl S. Gutekunst) writes:
>>
>>DEC's big RISC engine goes by the code name "Titan." It is something around 10
>>MIPS, up to 10 tightly-coupled CPUs. It's supposed to be a big secret, so don't
>>spead this around. :-)
>>
>><csg>
>
>Just to clarify things for the masses:
>
>Titan is a research RISC machine designed and implemented several years
>ago by DEC's Western Research Lab in Palo Alto.  It's been mentioned in
>a number of papers published by WRL (see Wall and Powell's paper in
>ASPLOS II, for example) so feel free to spread it around. :-)
>
>Allen


More Titan History:

	I was on a team of engineers trying to turn the research Titan
into Titan the product.Obviously we never succeeded mostly because of
political reasons,some valid,some not valid:


Titan:
	'Risc' Machine designed to run at 40 ns,I believe they are running
	 at 42.

	Scalar processor consisted of datapath,and 64kb i and d caches (split)
	line size was 4 longwords

	Seperate Coprocessor
	
	4 banks of 64 32 general purpose registers bit registers.

	128 Mbytes of main store

 	Entire processor (icache,dcache,datapath and floating point 
			  coprocessor) were contructed from 24 pin
	       dip components (100K ecl).

	Processor boards were approx 20" x 28",something like that

	7 slot I/O bay supported disks(currentl RA81s),enet,serial lines, and
	fiber optic link.

	Machine was designed as single user workstation for members of WRL.

	Languages at the time included modulea-2,C,and Fortran (I think 
	they have lisp up now too)

	A system (above hardware running 4.3 BSD) performed on an 
	aggregate basis,of 10 times a 780.

	Fully funtional protos were completed 2 years ago.

	I think it is still the fastest running uniprocessor in DEC

	Aprroximately same compiler technology as Mips.The papers
	mentioned by Allen should clue you in.

walter@garth.UUCP (Walter Bays) (04/22/88)

In article <50110@sun.uucp> ram@sun.UUCP (Renu Raman) writes:
>    Query:  Are there any RISC(y) processors that is running an OS other
>    than UNIX?

Query 2: What new commercial processors have been introduced in the last
five (or so) years that run an OS other than UNIX?
Partial Answer 2:  IBM PC, Apple Macintosh, Apollo

Query 3: What new commercial processors have been introduced in the last
five (or so) years that do not run UNIX?
-- 
------------------------------------------------------------------------------
Any similarities between my opinions and those of the
person who signs my paychecks is purely coincidental.
E-Mail route: ...!pyramid!garth!walter
USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 94303
Phone: (415) 852-2384
------------------------------------------------------------------------------

liz@hpcupt1.HP.COM (Liz Peters) (04/23/88)

>    Query:  Are there any RISC(y) processors that is running an OS other
>    than UNIX?
>
>><csg>
>
>Renu
>----------

HP's commercial OS, MPE, runs on HP's Precision Architecture.  This
combination is offered in the HP3000 line of computers.

				Liz Peters
				hplabs!hpda!liz

cdshaw@alberta.UUCP (Chris Shaw) (04/26/88)

In article <219@granite.dec.com> paulr@granite.UUCP (Paul Richardson) writes:
>RISC: 
>      1) A machine in which the instruction set is designed/chosen based
>         on what makes the most sense to put into the hardware....
>
>      2) ..a characteristic of RISC machines that they have a 'large'
>	 general purpose register file. ...
>/pgr

One of the 801 people (Blasgen) gave a talk here a while ago about 801, and 
the associated philosophy. Back then, what "Reduced" meant was "reduced
instruction time". That is, the design goals were to have the simplest 
instructions (nop/add/logic...) take one clock. Clearly, more complicated
stuff like multiply would take longer, but shortness of TIME was the main 
design goal.

Now, an ethic of this kind will lead a designer down a restricted design path:
Simple addressing, pipelines, caches, etc. If I recall right, the 801 was not 
a single-chip machine, so area restrictions did not apply (as much). Given that
the Berkeley and Stanford people wanted a single-chip CPU, the silicon area
restriction applies, so "Reduced" starts to mean "reduce the NUMBER of
instructions (so we can fit something useful on chip)".

I think that applying RISC to mean Reduced TIME is the only thing that makes
100% sense as a "commandment". Reducing NUMBER of instructions will probably
come out in the wash.

-- 
Chris Shaw    cdshaw@alberta.UUCP (via watmath, ihnp4 or ubc-vision)
University of Alberta
CatchPhrase: Bogus as HELL !

mcp@ziebmef.UUCP (Marc Plumb) (04/29/88)

garyb@hpmwtla.HP.COM (Gary Bringhurst) writes:

>Why has no one mentioned the Inmos Transputers?  They are certainly Risc'ish.

Sigh...

Is a processor with message passing, time-slicing, and context-switching
in microcode a RISC?  I honestly don't know where the Transputer belongs,
but 3 registers is a bit of a change from traditional RISC architectures.

The Transputer is a RISC in the "Relegate Important Stuff to Compiler"
sense - the amount of useful stuff that's been stripped from the instruction
set on the grounds that it can be implemented in terms of existing
instructions is astounding.  For example, since the Transputer considers
any non-zero value to be true, the magnitude comparison operations have
been reduced to signed greater-than, subtraction (zero result means equal
inputs) and "equal to constant", which can be used with a zero argument
to implement logical not.

Sufficient, certainly (it's turing-equivalent), but pleasant to use??

Sorry to go on, but the RISCiness of Transputers is a fabrication of
buzzword-happy marketroids, and I wouldn't want them to delude reasonably
sane people.
--
	-Colin (ncrcan!ziebmef!mcp)

livesey@sun.uucp (Jon Livesey) (05/02/88)

In article <358@ziebmef.UUCP>, mcp@ziebmef.UUCP (Marc Plumb) writes:
> 
> garyb@hpmwtla.HP.COM (Gary Bringhurst) writes:
> 
> >Why has no one mentioned the Inmos Transputers?  They are certainly Risc'ish.
> 
> Sigh...
> 
>	[much deleted]
> 
> Sufficient, certainly (it's turing-equivalent), but pleasant to use??
> 
> Sorry to go on, but the RISCiness of Transputers is a fabrication of
> buzzword-happy marketroids, and I wouldn't want them to delude reasonably
> sane people.

	You make some very good points about the transputer.   Unfortunately
you went a tiny bit overboard in the last two sentences.   Pleasantness-of-
use is not an implicit guarantee for RSIC machines.    Why should it be?
The RISCiness of Transputers is not a fabrication of marketeers.   The
Transputer turns up in perfectly respectable academic surveys of RISC machines.
One reference is Tabak D. "RISC Architecture", Research Studies Press, 1987.
Tabak is Abrahams-Curiel Professor of Computer Engineering at Ben Gurion
University, Israel, and has a cross appointment at George Mason University.
Tabak is careful to explain why he includes the Transputer as a RISC machine:

	    "Although the machine language has 111 instructions, 
	(approximately as in the Pyramid or Ridge, there is only 
	a *single instruction format* [Tabak's enphasis] and a 
	very simple one."
				{page 98}

    Tabak goes on to explain the Transputer instruction format and 
instruction set, emphasising that they "*eliminate the need* for
*complicated addressing modes*" [Tabak's emphasis again].   He descibes
their "prefix", which allows any operand to be manipulated in the Operand 
Register before being used, and the "operate" code which allows an 
instruction to be applied to the operands already loaded into the 
three operand Evaluation Stack.   Clearly, this does not make for simple
or intuitive assembler language programming, but Tabak makes the comment:

	    "It should be stressed that the regular user is not
	supposed to program in the machine language [he gives a short 
	description of Occam, deleted here]"
						{page 100}

    In an introductory section, Tabak lists eight criteria for RISCness.

							Transputer
							----------
	1. Few instructions (< 100 is best)		111
	2. Few addressing modes (1 or 2)		one
	3. Few instruction formats.			one
	4. Single cycle execution.			true for 80% of inst.
	5. Memory access by load/store
		instruction only.			yes
	6. Large register set.				none, but 4k on-chip memory.
							[there are six utility regs,
							such as PC, etc.]
	7. Hardwired control unit.			no, microcoded.
	8. HLL support reflected 
			in architecture			yes

    Using Tabak's criteria, the Transputer violates one of eight, satisfies
three, at least loosely, and satisfies four more completely.   Tabak comments
that the violation of using microcode is also seen in some other systems, and 
may be forgiven by advancing technology.


Jon.