sef@csun.UUCP (Sean Fagan) (04/03/88)
I was reading in some trade rag (sorry, forget which) about the MC88000, which uses a 'scoreboard' to have up to 3 instructions executing simultaneously (plus whatever pipelining the thing has). Also, someone mentioned here a few weeks ago that his company's chip was capable of doing a floating-point divide (or multiply, I forget that and which company; may have been MIPS) along with some other instruction, also simultaneously. The thing I find funny is that these people seem to think that these things are wonderful new ideas, yet I routinely work on a CDC 170 type machine, which can do a divide, a multiply, a floating point add or subtract, an integer add or subtract, and an address calculation, all at the same time. Not only that, but all but the divide are pipelined so that it can start a new operation each clock cycle (except the multiply, which needs to do it every other cycle). (The Cray can do more, as can the Cyber 205 and ETA 10.) Now, after the comment, a question: does anybody know what non-mainframes and non-supers have parallel functional units (preferably pipelined)? How abuot how popular these machines are, what their speeds are, etc? -- Sean Fagan uucp: {ihnp4,hplabs,psivax}!csun!sef CSUN Computer Center BITNET: 1GTLSEF@CALSTATE Northridge, CA 91330 (818) 885-2790 "I just build fast machines." -- S. Cray
jesup@pawl23.pawl.rpi.edu (Randell E. Jesup) (04/03/88)
In article <1168@csun.UUCP> sef@csun.UUCP (Sean Fagan) writes: >The thing I find funny is that these people seem to think that these things >are wonderful new ideas, yet I routinely work on a CDC 170 type machine, >which can do a divide, a multiply, a floating point add or subtract, an >integer add or subtract, and an address calculation, all at the same time. Remember that to do so in a mainframe, or even a mini, is relatively easy, just throw more hardware at it. Doing it in a single chip microprocessor isn't quite as easy (ALU's are BIG). // Randell Jesup Lunge Software Development // Dedicated Amiga Programmer 13 Frear Ave, Troy, NY 12180 \\// beowulf!lunge!jesup@steinmetz.UUCP (518) 272-2942 \/ (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup (-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)
walter@garth.UUCP (Walter Bays) (04/05/88)
In article <1168@csun.UUCP> sef@csun.UUCP (Sean Fagan) writes: >I was reading in some trade rag (sorry, forget which) about the MC88000, >which uses a 'scoreboard' to have up to 3 instructions executing >simultaneously (plus whatever pipelining the thing has). [...] >The thing I find funny is that these people seem to think that these things >are wonderful new ideas, yet I routinely work on a CDC 170 type machine, >[They are old ideas.] >Now, after the comment, a question: does anybody know what non-mainframes >and non-supers have parallel functional units (preferably pipelined)? >How about how popular these machines are, what their speeds are, etc? The Intergraph Clipper has parallel integer and (on-chip) floating point units, with pipelining. The integer unit is multi-stage and the pipeline is controlled by a scoreboard. It has separate instruction and data busses through two 4KB integrated cache/MMU chips to interface with slow DRAM's. The C100 was introduced in 1985 and runs at 33 MHz. The C300 will be introduced this year, runs at 50 MHz, and has an improved pipeline. Not surprisingly, the original design team came to (then Fairchild) from Cray. In college I worked on a CDC 6600 (Seymour's machine) which had multiple functional units, pipelining, scoreboarding, and I/O co-processors. Those who had to use assembly language cursed its load/store architecture and limited addressing modes. The rest of us were just glad it was so fast. The 6600 was a significant improvement over most of its successors. -- ------------------------------------------------------------------------------ Any similarities between my opinions and those of the person who signs my paychecks is purely coincidental. E-Mail route: ...!pyramid!garth!walter USPS: Intergraph APD, 2400 Geng Road, Palo Alto, California 94303 Phone: (415) 852-2384 ------------------------------------------------------------------------------
root@mfci.UUCP (SuperUser) (04/06/88)
In article <1168@csun.UUCP] sef@csun.UUCP (Sean Fagan) writes:
]I was reading in some trade rag (sorry, forget which) about the MC88000,
]which uses a 'scoreboard' to have up to 3 instructions executing
]simultaneously (plus whatever pipelining the thing has). Also, someone
]mentioned here a few weeks ago that his company's chip was capable of doing
]a floating-point divide (or multiply, I forget that and which company; may
]have been MIPS) along with some other instruction, also simultaneously.
]The thing I find funny is that these people seem to think that these things
]are wonderful new ideas, yet I routinely work on a CDC 170 type machine,
]which can do a divide, a multiply, a floating point add or subtract, an
]integer add or subtract, and an address calculation, all at the same time.
]Not only that, but all but the divide are pipelined so that it can start
]a new operation each clock cycle (except the multiply, which needs to do it
]every other cycle). (The Cray can do more, as can the Cyber 205 and ETA 10.)
]Now, after the comment, a question: does anybody know what non-mainframes
]and non-supers have parallel functional units (preferably pipelined)?
]How abuot how popular these machines are, what their speeds are, etc?
]
]
]--
]Sean Fagan uucp: {ihnp4,hplabs,psivax}!csun!sef
]CSUN Computer Center BITNET: 1GTLSEF@CALSTATE
]Northridge, CA 91330 (818) 885-2790
]"I just build fast machines." -- S. Cray
Just having multiple functional units is interesting but insufficient.
You also require the means to control these functional units, and you
need a way to give them enough things to do that your application
achieves the performance you require. Vector machines like those you
mention above do indeed have parallelism built into their architecture,
but the way they invoke it at run time is to execute vector instructions,
which can only do repetitive operations on aggregate data sets. The
main task of their compilers is to find low-level parallelism that they
can somehow coerce into the operators provided in the hardware. If your
application does not need the operator built in, but rather some other
one, you're pretty much out of luck.
The machine we make tackles this problem directly, but rather than bore
other readers I'd refer you to our ASPLOS-II paper for starters.
Bob Colwell mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405 203-488-6090
root@mfci.UUCP (SuperUser) (04/06/88)
In article <606@imagine.PAWL.RPI.EDU] beowulf!lunge!jesup@steinmetz.UUCP writes: ]In article <1168@csun.UUCP] sef@csun.UUCP (Sean Fagan) writes: ]]The thing I find funny is that these people seem to think that these things ]]are wonderful new ideas, yet I routinely work on a CDC 170 type machine, ]]which can do a divide, a multiply, a floating point add or subtract, an ]]integer add or subtract, and an address calculation, all at the same time. ] ] Remember that to do so in a mainframe, or even a mini, is relatively ]easy, just throw more hardware at it. Doing it in a single chip microprocessor ]isn't quite as easy (ALU's are BIG). ] ] // Randell Jesup Lunge Software Development ] // Dedicated Amiga Programmer 13 Frear Ave, Troy, NY 12180 ] \\// beowulf!lunge!jesup@steinmetz.UUCP (518) 272-2942 ] \/ (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup ] ](-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-) Guys, please, it ain't that simple. (I was once a microprocessor designer, too). The rules are different, but they aren't easier. The grass only looks greener. Bob Colwell mfci!colwell@uunet.uucp Multiflow Computer 175 N. Main St. Branford, CT 06405 203-488-6090
aglew@urbsdc.Urbana.Gould.COM (04/06/88)
Sorry about the crud.
lamaster@ames.arpa (Hugh LaMaster) (04/07/88)
In article <323@m3.mfci.UUCP> mfci!colwell@uunet.UUCP (Robert Colwell) writes: >In article <1168@csun.UUCP] sef@csun.UUCP (Sean Fagan) writes: >]I was reading in some trade rag (sorry, forget which) about the MC88000, >]which uses a 'scoreboard' to have up to 3 instructions executing >]simultaneously (plus whatever pipelining the thing has). Also, someone >achieves the performance you require. Vector machines like those you >mention above do indeed have parallelism built into their architecture, >but the way they invoke it at run time is to execute vector instructions, >which can only do repetitive operations on aggregate data sets. The Not to beat a dead horse too hard, but these are two different kinds of parallelism in the hardware, and some machines, such as the Cyber 205 for example, use both simultaneously. Specifically, vector instructions are memory to memory instructions on the 205, and scalar instruction issue continues as long as there are no conflicts with vector instructions caused by memory references AND as long as there are no conflicts with previously issued scalar instructions because of register references. So these two different kinds of parallelism are taking place simultaneously, and, in fact, performance on vectorized code with short vectors depends on the hardware preparing additional descriptors for the vector units while they are operating on a previously issued instruction. The ETA-10 is the same, and the Cray machines, which use vector registers also are similar. Register to register instructions continue to be issued until there is a register conflict. This is not to knock some other architectures such as Multiflow. The Multiflow machine has a certain elegant simplicity because it doesn't need the vector part - it vectorizes using parallel functional units, which also work with scalar operands. Architectures like Multiflow require a very sophisticated compiler, and Multiflow takes it even further by optimizing outside of basic blocks, something neither the Cray nor CDC/ETA compilers do.
sedwards@esunix.UUCP (Scott Edwards) (04/14/88)
From article <323@m3.mfci.UUCP>, by root@mfci.UUCP (SuperUser): > > The machine we make tackles this problem directly, but rather than bore > other readers I'd refer you to our ASPLOS-II paper for starters. > Aw c'mon, you could bore us just a little, couldn't you?