rcd@ico.isc.com (Dick Dunn) (10/27/89)
One more about the recent publicity given the IBM "America" chip set... The trade press (such as it is) would have us believe that America has made some major innovation in providing five instructions executing in parallel. However, to achieve this rate--which has actually been billed as "5 instructions/cycle"--you have to do a branch in one of every five instruc- tions! Since the magic figure of 5 requires using the floating-point unit's multiply-and-accumulate capability, if you're doing only integer work, you need a branch every three instructions! You NEED goto's! I do *not* want to maintain the sort of code that will keep this processor running at max issue rate! Now, mind you, I am not trying to belittle the work IBM has done. Branches are in some sense the nemesis of very fast CPUs, and it helps to be working ahead and being able to handle branches without too much delay. I'm only pissed off at the pseudomarketing we're seeing in the trade press. The processor may be able to issue 5 instructions in some ideal cycle, but it does NOT run at 5 instructions/cycle for any believable piece of code! (Or maybe it could...who knows; you might be able to replicate code and scramble the branches around enough to get close for some codes. But it would take a compiler which could look down various paths and figure out how instructions could be scheduled along the different paths and replicated, sorting out the hazards...I suppose you could call it something like "trace scheduling"...Bob Colwell, you there? I see a customer for your technology!:-):-):-) -- Dick Dunn rcd@ico.isc.com uucp: {ncar,nbires}!ico!rcd (303)449-2870 ...No DOS. UNIX.
billms@dip.eecs.umich.edu (Bill Mangione-Smith) (10/27/89)
In article <1989Oct27.050923.5294@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes: >The >processor may be able to issue 5 instructions in some ideal cycle, but it >does NOT run at 5 instructions/cycle for any believable piece of code! > >Dick Dunn rcd@ico.isc.com uucp: {ncar,nbires}!ico!rcd (303)449-2870 > ...No DOS. UNIX. In all of the writings I have seen, including the iccd paper, the stated performance goal is something just over 1 instruction issued per clock. To do this with 'real' code, you obviously need a peak issue rate of over 1 instruction per clock. IBM, atleast the R&D types, doesn't seem to be trying to fool anyone that the actual performance is anywhere near 4 or 5 instructions per clock. But then something like 1.3 instructions/clock for *real live code* would be a big step up in performance anyway. Have you been talking to sales guys? Or which magazines are making these claims? bill mangione-smith advanced computer architecture lab university of michigan ann arbor billms@dip.eecs.umich.edu
colwell@mfci.UUCP (Robert Colwell) (10/27/89)
In article <1989Oct27.050923.5294@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes: >The trade press (such as it is) would have us believe that America has made >some major innovation in providing five instructions executing in parallel. >However, to achieve this rate--which has actually been billed as "5 >instructions/cycle"--you have to do a branch in one of every five instruc- >tions! Since the magic figure of 5 requires using the floating-point >unit's multiply-and-accumulate capability, if you're doing only integer >work, you need a branch every three instructions! You NEED goto's! > >I do *not* want to maintain the sort of code that will keep this processor >running at max issue rate! You're obviously talking assembly here, right? >Now, mind you, I am not trying to belittle the work IBM has done. Branches >are in some sense the nemesis of very fast CPUs, and it helps to be working >ahead and being able to handle branches without too much delay. I'm only >pissed off at the pseudomarketing we're seeing in the trade press. The >processor may be able to issue 5 instructions in some ideal cycle, but it >does NOT run at 5 instructions/cycle for any believable piece of code! Aw, geez, Dick, you're just mired in reality. Next you're going to turn to the i860, 88000, and all the other new processors that have the hardware to do multiple ops at once and realize that they have similar problems. Keep this up and Eugene Brooks is going to sic his KILLERS on you. >(Or maybe it could...who knows; you might be able to replicate code and >scramble the branches around enough to get close for some codes. But it >would take a compiler which could look down various paths and figure out >how instructions could be scheduled along the different paths and >replicated, sorting out the hazards...I suppose you could call it something >like "trace scheduling"...Bob Colwell, you there? I see a customer for >your technology!:-):-):-) We discussed this problem in our IEEE Transactions paper, and Ellis also goes over it in his thesis. When you compact a lot of different instructions into a wide-word instruction, in a sense, you drag their branches in too. So it helps to have your branching abilities scale with the number of functional units you're keeping busy. Of course, lots of other things, such as the number of memory ports you can keep busy at once, the number of register read/write ports plus the number of registers, and the instruction stream bandwidth also need to scale with the number of functional units if you are trying to create a balanced architecture. Bob Colwell ..!uunet!mfci!colwell Multiflow Computer or colwell@multiflow.com 31 Business Park Dr. Branford, CT 06405 203-488-6090
rcd@ico.isc.com (Dick Dunn) (10/28/89)
[I had complained about the hype saying "America" would run at 5 instructions per cycle.] billms@dip.eecs.umich.edu (Bill Mangione-Smith) writes: > In all of the writings I have seen, including the iccd paper, the stated > performance goal is something just over 1 instruction issued per clock... OK, fine. I haven't seen the paper(s). What I was complaining about was the hype surrounding it, NOT the technical characteristics of the processor itself. I'll give a couple of examples from 10/9 _EE_Times_ since that's the one I have handy right now: "In technical papers presented at the International Conference on Computer Design, IBM claimed peak operation of five instructions per cycle..." Note the wording. Somehow, somewhere along the way, I suspect that a technical statement--that it is possible to issue five instructions in one cycle--got turned into "peak operation" with a rate. > ...IBM, atleast the R&D types, doesn't seem to be > trying to fool anyone that the actual performance is anywhere near > 4 or 5 instructions per clock... >...Have you been talking to sales guys?... [see the original posting--I said I was talking about the trade press] Here's another one, and again you have to think carefully about the wording: "Randy D. Groves, manager of RISC workstations at the Austin Advanced Workstation division...[said]...`While both Apollo's Prism and Intel's i860 had the same second-generation RISC goals--com- pound function instructions and a superscalar machine with more than one instruction per cycle--we actually met our goal of executing four, and with the compound accumulate instruction, five instructions simultaneously, in one cycle,'..." This statement leads you right to the edge of the idea of a rate of five instructions per cycle, if you're thinking carelessly. But there's no real connection made between the possibility of issuing five instructions in a cycle (an event) and what any believable rate (series of events over time) might be. The trade press is more than happy to supply that nonexistent connection. Again, I am NOT flaming the processor design. Yes, you need to issue multiple instructions per cycle if you're going to beat the 1 CPI goal--that's "obvious". What I'm after is that we (or at least I, so far) haven't seen any realistic figure for instruction issue rate, yet I keep seeing this magic "5" thrown around. People should be saying there are 5 (or maybe 4, accounting for multiply/accumulate) independent functional units which can execute instructions, and get rid of this "5 instructions/ cycle" crap. -- Dick Dunn rcd@ico.isc.com uucp: {ncar,nbires}!ico!rcd (303)449-2870 ...Worst-case analysis must never begin with "No one would ever want..."
philf@xymox.metaphor.com (Phil Fernandez) (10/28/89)
In article <1989Oct27.050923.5294@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes: >... The >processor may be able to issue 5 instructions in some ideal cycle, but it >does NOT run at 5 instructions/cycle for any believable piece of code! At an all-day IBM briefing last August on the new machine architecture, the IBM folks tole me over and over, "5 instructions/cycle". As this went on, I became increasingly skeptical and inquisitive, and finally pushed the issue. In the end, IBM admitted that in real-world situations, they saw more like 1.0-1.2 instructions/cycle. 5i/c is just a theoretical maximum. Don't believe the hype, boys and girls... pmf (This opinions are mine only, and do not reflect the opinions of Metaphor) +-----------------------------+----------------------------------------------+ | Phil Fernandez | philf@metaphor.com | | | ...!{apple|decwrl}!metaphor!philf | | Metaphor Computer Systems |"Does the body rule the mind, or does the mind| | Mountain View, CA | rule the body? I dunno..." - Morrissey | +-----------------------------+----------------------------------------------+
henry@utzoo.uucp (Henry Spencer) (10/29/89)
In article <863@metaphor.Metaphor.COM> philf@xymox.metaphor.com (Phil Fernandez) writes: >... In the end, IBM >admitted that in real-world situations, they saw more like 1.0-1.2 >instructions/cycle. 5i/c is just a theoretical maximum... As John Mashey has observed in regard to things like MIPS ratings, such numbers should always be considered "guaranteed not to exceed" ratings. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
presser@mfci.UUCP (Marshall E. Presser) (11/10/89)
In article <863@metaphor.Metaphor.COM> philf@xymox.metaphor.com (Phil Fernandez) writes: >In article <1989Oct27.050923.5294@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes: >>... The >>processor may be able to issue 5 instructions in some ideal cycle, but it >>does NOT run at 5 instructions/cycle for any believable piece of code! > >At an all-day IBM briefing last August on the new machine >architecture, the IBM folks tole me over and over, "5 >instructions/cycle". As this went on, I became increasingly skeptical >and inquisitive, and finally pushed the issue. In the end, IBM >admitted that in real-world situations, they saw more like 1.0-1.2 >instructions/cycle. 5i/c is just a theoretical maximum. Don't believe >the hype, boys and girls... > >pmf > >(This opinions are mine only, and do not reflect the opinions of Metaphor) > Please excuse me if you have heard it here before, but the Trace Scheduling Compacting Compiler(TM) here at Multiflow Computer frequently schedules 10 or more of maximal 14 available instructions on our TRACE 14/300 Compiler. Is it easy? No. Would you want to write code like this by hand? No. Can I produce a pathology in which only sequential code is generated? Of course I can. But the compiler technology exists today to find the inherent low level (fine grained) parallelism in lots of real-world situations. As dramatic improvement in cycle time become more difficult to produce, it is the compiler generated parallelization of code that will ultimately produce minimal time to solution. (Usual disclaimer about source of opinions). Marshall Presser ********************************************************************** * Marshall E Presser internet: presser@multiflow.com * * Multiflow Computer, Inc. uucp: uunet!mfci!presser * * 9175 Guilford Road, Suite 310 voice: (301)880-4181 * * Columbia Maryland 21046 DC metro: (301)206-3244 * **********************************************************************