[comp.arch] Motorola 88000 and others

sef@csun.UUCP (Sean Fagan) (04/03/88)

I was reading in some trade rag (sorry, forget which) about the MC88000,
which uses a 'scoreboard' to have up to 3 instructions executing
simultaneously (plus whatever pipelining the thing has).  Also, someone
mentioned here a few weeks ago that his company's chip was capable of doing
a floating-point divide (or multiply, I forget that and which company; may
have been MIPS) along with some other instruction, also simultaneously.
The thing I find funny is that these people seem to think that these things
are wonderful new ideas, yet I routinely work on a CDC 170 type machine,
which can do a divide, a multiply, a floating point add or subtract, an
integer add or subtract, and an address calculation, all at the same time.
Not only that, but all but the divide are pipelined so that it can start
a new operation each clock cycle (except the multiply, which needs to do it
every other cycle).  (The Cray can do more, as can the Cyber 205 and ETA 10.)
Now, after the comment, a question:  does anybody know what non-mainframes
and non-supers have parallel functional units (preferably pipelined)?
How abuot how popular these machines are, what their speeds are, etc?


-- 
Sean Fagan                   uucp:   {ihnp4,hplabs,psivax}!csun!sef
CSUN Computer Center         BITNET: 1GTLSEF@CALSTATE
Northridge, CA 91330         (818) 885-2790
"I just build fast machines."  -- S. Cray

jesup@pawl23.pawl.rpi.edu (Randell E. Jesup) (04/03/88)

In article <1168@csun.UUCP> sef@csun.UUCP (Sean Fagan) writes:
>The thing I find funny is that these people seem to think that these things
>are wonderful new ideas, yet I routinely work on a CDC 170 type machine,
>which can do a divide, a multiply, a floating point add or subtract, an
>integer add or subtract, and an address calculation, all at the same time.

	Remember that to do so in a mainframe, or even a mini, is relatively
easy, just throw more hardware at it.  Doing it in a single chip microprocessor
isn't quite as easy (ALU's are BIG).

     //	Randell Jesup			      Lunge Software Development
    //	Dedicated Amiga Programmer            13 Frear Ave, Troy, NY 12180
 \\//	beowulf!lunge!jesup@steinmetz.UUCP    (518) 272-2942
  \/    (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup

(-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)

walter@garth.UUCP (Walter Bays) (04/05/88)

In article <1168@csun.UUCP> sef@csun.UUCP (Sean Fagan) writes:
>I was reading in some trade rag (sorry, forget which) about the MC88000,
>which uses a 'scoreboard' to have up to 3 instructions executing
>simultaneously (plus whatever pipelining the thing has).  [...]
>The thing I find funny is that these people seem to think that these things
>are wonderful new ideas, yet I routinely work on a CDC 170 type machine,
>[They are old ideas.]
>Now, after the comment, a question:  does anybody know what non-mainframes
>and non-supers have parallel functional units (preferably pipelined)?
>How about how popular these machines are, what their speeds are, etc?

The Intergraph Clipper has parallel integer and (on-chip) floating
point units, with pipelining.  The integer unit is multi-stage and the
pipeline is controlled by a scoreboard.  It has separate instruction
and data busses through two 4KB integrated cache/MMU chips to interface
with slow DRAM's.  The C100 was introduced in 1985 and runs at 33 MHz.
The C300 will be introduced this year, runs at 50 MHz, and has an
improved pipeline.  Not surprisingly, the original design team came to
(then Fairchild) from Cray.

In college I worked on a CDC 6600 (Seymour's machine) which had
multiple functional units, pipelining, scoreboarding, and I/O
co-processors.  Those who had to use assembly language cursed its
load/store architecture and limited addressing modes.  The rest of us
were just glad it was so fast.  The 6600 was a significant improvement
over most of its successors.
-- 
------------------------------------------------------------------------------
Any similarities between my opinions and those of the
person who signs my paychecks is purely coincidental.
E-Mail route: ...!pyramid!garth!walter
USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 94303
Phone: (415) 852-2384
------------------------------------------------------------------------------

root@mfci.UUCP (SuperUser) (04/06/88)

In article <1168@csun.UUCP] sef@csun.UUCP (Sean Fagan) writes:
]I was reading in some trade rag (sorry, forget which) about the MC88000,
]which uses a 'scoreboard' to have up to 3 instructions executing
]simultaneously (plus whatever pipelining the thing has).  Also, someone
]mentioned here a few weeks ago that his company's chip was capable of doing
]a floating-point divide (or multiply, I forget that and which company; may
]have been MIPS) along with some other instruction, also simultaneously.
]The thing I find funny is that these people seem to think that these things
]are wonderful new ideas, yet I routinely work on a CDC 170 type machine,
]which can do a divide, a multiply, a floating point add or subtract, an
]integer add or subtract, and an address calculation, all at the same time.
]Not only that, but all but the divide are pipelined so that it can start
]a new operation each clock cycle (except the multiply, which needs to do it
]every other cycle).  (The Cray can do more, as can the Cyber 205 and ETA 10.)
]Now, after the comment, a question:  does anybody know what non-mainframes
]and non-supers have parallel functional units (preferably pipelined)?
]How abuot how popular these machines are, what their speeds are, etc?
]
]
]-- 
]Sean Fagan                   uucp:   {ihnp4,hplabs,psivax}!csun!sef
]CSUN Computer Center         BITNET: 1GTLSEF@CALSTATE
]Northridge, CA 91330         (818) 885-2790
]"I just build fast machines."  -- S. Cray

Just having multiple functional units is interesting but insufficient.
You also require the means to control these functional units, and you
need a way to give them enough things to do that your application 
achieves the performance you require.  Vector machines like those you
mention above do indeed have parallelism built into their architecture,
but the way they invoke it at run time is to execute vector instructions,
which can only do repetitive operations on aggregate data sets.  The
main task of their compilers is to find low-level parallelism that they
can somehow coerce into the operators provided in the hardware.  If your
application does not need the operator built in, but rather some other
one, you're pretty much out of luck.

The machine we make tackles this problem directly, but rather than bore
other readers I'd refer you to our ASPLOS-II paper for starters.

Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090

root@mfci.UUCP (SuperUser) (04/06/88)

In article <606@imagine.PAWL.RPI.EDU] beowulf!lunge!jesup@steinmetz.UUCP writes:
]In article <1168@csun.UUCP] sef@csun.UUCP (Sean Fagan) writes:
]]The thing I find funny is that these people seem to think that these things
]]are wonderful new ideas, yet I routinely work on a CDC 170 type machine,
]]which can do a divide, a multiply, a floating point add or subtract, an
]]integer add or subtract, and an address calculation, all at the same time.
]
]	Remember that to do so in a mainframe, or even a mini, is relatively
]easy, just throw more hardware at it.  Doing it in a single chip microprocessor
]isn't quite as easy (ALU's are BIG).
]
]     //	Randell Jesup			      Lunge Software Development
]    //	Dedicated Amiga Programmer            13 Frear Ave, Troy, NY 12180
] \\//	beowulf!lunge!jesup@steinmetz.UUCP    (518) 272-2942
]  \/    (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup
]
](-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)

Guys, please, it ain't that simple.  (I was once a microprocessor
designer, too).  The rules are different, but they aren't easier.
The grass only looks greener.

Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090

aglew@urbsdc.Urbana.Gould.COM (04/06/88)

Sorry about the crud.

lamaster@ames.arpa (Hugh LaMaster) (04/07/88)

In article <323@m3.mfci.UUCP> mfci!colwell@uunet.UUCP (Robert Colwell) writes:

>In article <1168@csun.UUCP] sef@csun.UUCP (Sean Fagan) writes:

>]I was reading in some trade rag (sorry, forget which) about the MC88000,
>]which uses a 'scoreboard' to have up to 3 instructions executing
>]simultaneously (plus whatever pipelining the thing has).  Also, someone

>achieves the performance you require.  Vector machines like those you
>mention above do indeed have parallelism built into their architecture,
>but the way they invoke it at run time is to execute vector instructions,
>which can only do repetitive operations on aggregate data sets.  The

Not to beat a dead horse too hard, but these are two different kinds of
parallelism in the hardware, and some machines, such as the Cyber 205
for example, use both simultaneously.  Specifically, vector instructions
are memory to memory instructions on the 205, and scalar instruction
issue continues as long as there are no conflicts with vector
instructions caused by memory references AND as long as there are no
conflicts with previously issued scalar instructions because of register
references.  So these two different kinds of parallelism are taking
place simultaneously, and, in fact, performance on vectorized code 
with short vectors depends on the hardware preparing additional
descriptors for the vector units while they are operating on a
previously issued instruction.  The ETA-10 is the same, and the
Cray machines, which use vector registers also are similar.
Register to register instructions continue to be issued until there
is a register conflict.

This is not to knock some other architectures such as Multiflow.  The
Multiflow machine has a certain elegant simplicity because it doesn't
need the vector part - it vectorizes using parallel functional units,
which also work with scalar operands.  Architectures like Multiflow
require a very sophisticated compiler, and Multiflow takes it even
further by optimizing outside of basic blocks, something neither the
Cray nor CDC/ETA compilers do.

sedwards@esunix.UUCP (Scott Edwards) (04/14/88)

From article <323@m3.mfci.UUCP>, by root@mfci.UUCP (SuperUser):
> 
> The machine we make tackles this problem directly, but rather than bore
> other readers I'd refer you to our ASPLOS-II paper for starters.
> 

Aw c'mon, you could bore us just a little, couldn't you?