[comp.arch] GOTO considered essential??

rcd@ico.isc.com (Dick Dunn) (10/27/89)

One more about the recent publicity given the IBM "America" chip set...

The trade press (such as it is) would have us believe that America has made
some major innovation in providing five instructions executing in parallel.
However, to achieve this rate--which has actually been billed as "5
instructions/cycle"--you have to do a branch in one of every five instruc-
tions!  Since the magic figure of 5 requires using the floating-point
unit's multiply-and-accumulate capability, if you're doing only integer
work, you need a branch every three instructions!  You NEED goto's!

I do *not* want to maintain the sort of code that will keep this processor
running at max issue rate!

Now, mind you, I am not trying to belittle the work IBM has done.  Branches
are in some sense the nemesis of very fast CPUs, and it helps to be working
ahead and being able to handle branches without too much delay.  I'm only
pissed off at the pseudomarketing we're seeing in the trade press.  The
processor may be able to issue 5 instructions in some ideal cycle, but it
does NOT run at 5 instructions/cycle for any believable piece of code!

(Or maybe it could...who knows; you might be able to replicate code and
scramble the branches around enough to get close for some codes.  But it
would take a compiler which could look down various paths and figure out
how instructions could be scheduled along the different paths and
replicated, sorting out the hazards...I suppose you could call it something
like "trace scheduling"...Bob Colwell, you there?  I see a customer for
your technology!:-):-):-)
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...No DOS.  UNIX.

billms@dip.eecs.umich.edu (Bill Mangione-Smith) (10/27/89)

In article <1989Oct27.050923.5294@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes:
>The
>processor may be able to issue 5 instructions in some ideal cycle, but it
>does NOT run at 5 instructions/cycle for any believable piece of code!
>
>Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
>   ...No DOS.  UNIX.

In all of the writings I have seen, including the iccd paper, the stated
performance goal is something just over 1 instruction issued per clock.  To
do this with 'real' code, you obviously need a peak issue rate of over 1 
instruction per clock.  IBM, atleast the R&D types, doesn't seem to be
trying to fool anyone that the actual performance is anywhere near
4 or 5 instructions per clock. But then something like 1.3 instructions/clock
for *real live code* would be a big step up in performance anyway. 

Have you been talking to sales guys? Or which magazines are making these claims? 

bill mangione-smith
advanced computer architecture lab
university of michigan
ann arbor

billms@dip.eecs.umich.edu  

colwell@mfci.UUCP (Robert Colwell) (10/27/89)

In article <1989Oct27.050923.5294@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes:
>The trade press (such as it is) would have us believe that America has made
>some major innovation in providing five instructions executing in parallel.
>However, to achieve this rate--which has actually been billed as "5
>instructions/cycle"--you have to do a branch in one of every five instruc-
>tions!  Since the magic figure of 5 requires using the floating-point
>unit's multiply-and-accumulate capability, if you're doing only integer
>work, you need a branch every three instructions!  You NEED goto's!
>
>I do *not* want to maintain the sort of code that will keep this processor
>running at max issue rate!

You're obviously talking assembly here, right?  

>Now, mind you, I am not trying to belittle the work IBM has done.  Branches
>are in some sense the nemesis of very fast CPUs, and it helps to be working
>ahead and being able to handle branches without too much delay.  I'm only
>pissed off at the pseudomarketing we're seeing in the trade press.  The
>processor may be able to issue 5 instructions in some ideal cycle, but it
>does NOT run at 5 instructions/cycle for any believable piece of code!

Aw, geez, Dick, you're just mired in reality.  Next you're going to turn
to the i860, 88000, and all the other new processors that have the hardware
to do multiple ops at once and realize that they have similar problems.
Keep this up and Eugene Brooks is going to sic his KILLERS on you.

>(Or maybe it could...who knows; you might be able to replicate code and
>scramble the branches around enough to get close for some codes.  But it
>would take a compiler which could look down various paths and figure out
>how instructions could be scheduled along the different paths and
>replicated, sorting out the hazards...I suppose you could call it something
>like "trace scheduling"...Bob Colwell, you there?  I see a customer for
>your technology!:-):-):-)

We discussed this problem in our IEEE Transactions paper, and Ellis 
also goes over it in his thesis.  When you compact a lot of different
instructions into a wide-word instruction, in a sense, you drag their
branches in too.  So it helps to have your branching abilities scale
with the number of functional units you're keeping busy.  Of course,
lots of other things, such as the number of memory ports you can 
keep busy at once, the number of register read/write ports plus the
number of registers, and the instruction stream bandwidth also need
to scale with the number of functional units if you are trying to 
create a balanced architecture.

Bob Colwell               ..!uunet!mfci!colwell
Multiflow Computer     or colwell@multiflow.com
31 Business Park Dr.
Branford, CT 06405     203-488-6090

rcd@ico.isc.com (Dick Dunn) (10/28/89)

[I had complained about the hype saying "America" would run at 5
instructions per cycle.]

billms@dip.eecs.umich.edu (Bill Mangione-Smith) writes:
> In all of the writings I have seen, including the iccd paper, the stated
> performance goal is something just over 1 instruction issued per clock...

OK, fine.  I haven't seen the paper(s).  What I was complaining about was
the hype surrounding it, NOT the technical characteristics of the processor
itself.  I'll give a couple of examples from 10/9 _EE_Times_ since that's
the one I have handy right now:

	"In technical papers presented at the International Conference on
	Computer Design, IBM claimed peak operation of five instructions
	per cycle..."

Note the wording.  Somehow, somewhere along the way, I suspect that a
technical statement--that it is possible to issue five instructions in one
cycle--got turned into "peak operation" with a rate.

> ...IBM, atleast the R&D types, doesn't seem to be
> trying to fool anyone that the actual performance is anywhere near
> 4 or 5 instructions per clock...
>...Have you been talking to sales guys?...
[see the original posting--I said I was talking about the trade press]

Here's another one, and again you have to think carefully about the
wording:
	"Randy D. Groves, manager of RISC workstations at the Austin
	Advanced Workstation division...[said]...`While both Apollo's Prism
	and Intel's i860 had the same second-generation RISC goals--com-
	pound function instructions and a superscalar machine with more
	than one instruction per cycle--we actually met our goal of
	executing four, and with the compound accumulate instruction, five
	instructions simultaneously, in one cycle,'..."

This statement leads you right to the edge of the idea of a rate of five
instructions per cycle, if you're thinking carelessly.  But there's no real
connection made between the possibility of issuing five instructions in a
cycle (an event) and what any believable rate (series of events over time)
might be.  The trade press is more than happy to supply that nonexistent
connection.

Again, I am NOT flaming the processor design.  Yes, you need to issue
multiple instructions per cycle if you're going to beat the 1 CPI
goal--that's "obvious".  What I'm after is that we (or at least I, so far)
haven't seen any realistic figure for instruction issue rate, yet I keep
seeing this magic "5" thrown around.  People should be saying there are 5
(or maybe 4, accounting for multiply/accumulate) independent functional
units which can execute instructions, and get rid of this "5 instructions/
cycle" crap.
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...Worst-case analysis must never begin with "No one would ever want..."

philf@xymox.metaphor.com (Phil Fernandez) (10/28/89)

In article <1989Oct27.050923.5294@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes:
>... The
>processor may be able to issue 5 instructions in some ideal cycle, but it
>does NOT run at 5 instructions/cycle for any believable piece of code!
 
At an all-day IBM briefing last August on the new machine
architecture, the IBM folks tole me over and over, "5
instructions/cycle".  As this went on, I became increasingly skeptical
and inquisitive, and finally pushed the issue.  In the end, IBM
admitted that in real-world situations, they saw more like 1.0-1.2
instructions/cycle. 5i/c is just a theoretical maximum.  Don't believe
the hype, boys and girls...

pmf

(This opinions are mine only, and do not reflect the opinions of Metaphor)



+-----------------------------+----------------------------------------------+
| Phil Fernandez              |             philf@metaphor.com               |
|                             |     ...!{apple|decwrl}!metaphor!philf        |
| Metaphor Computer Systems   |"Does the body rule the mind, or does the mind|
| Mountain View, CA           | rule the body?  I dunno..." - Morrissey      |
+-----------------------------+----------------------------------------------+

henry@utzoo.uucp (Henry Spencer) (10/29/89)

In article <863@metaphor.Metaphor.COM> philf@xymox.metaphor.com (Phil Fernandez) writes:
>... In the end, IBM
>admitted that in real-world situations, they saw more like 1.0-1.2
>instructions/cycle. 5i/c is just a theoretical maximum...

As John Mashey has observed in regard to things like MIPS ratings, such
numbers should always be considered "guaranteed not to exceed" ratings.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

presser@mfci.UUCP (Marshall E. Presser) (11/10/89)

In article <863@metaphor.Metaphor.COM> philf@xymox.metaphor.com (Phil Fernandez) writes:
>In article <1989Oct27.050923.5294@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes:
>>... The
>>processor may be able to issue 5 instructions in some ideal cycle, but it
>>does NOT run at 5 instructions/cycle for any believable piece of code!
> 
>At an all-day IBM briefing last August on the new machine
>architecture, the IBM folks tole me over and over, "5
>instructions/cycle".  As this went on, I became increasingly skeptical
>and inquisitive, and finally pushed the issue.  In the end, IBM
>admitted that in real-world situations, they saw more like 1.0-1.2
>instructions/cycle. 5i/c is just a theoretical maximum.  Don't believe
>the hype, boys and girls...
>
>pmf
>
>(This opinions are mine only, and do not reflect the opinions of Metaphor)
>

Please excuse me if you have heard it here before, but the
Trace Scheduling Compacting Compiler(TM) here at Multiflow
Computer frequently schedules 10 or more of maximal 14
available instructions on our TRACE 14/300 Compiler.  Is it
easy? No.  Would you want to write code like this by hand? No.
Can I produce a pathology in which only sequential code is
generated?  Of course I can.

But the compiler technology exists today to find the inherent
low level (fine grained) parallelism in lots of real-world
situations.  As dramatic improvement in cycle time become more
difficult to produce, it is the compiler generated
parallelization of code that will ultimately produce minimal
time to solution.

(Usual disclaimer about source of opinions).
			Marshall Presser
**********************************************************************
*  Marshall E Presser                internet: presser@multiflow.com *
*  Multiflow Computer, Inc.          uucp:     uunet!mfci!presser    *
*  9175 Guilford Road, Suite 310     voice:    (301)880-4181         *
*  Columbia Maryland 21046           DC metro: (301)206-3244         *
**********************************************************************