lamaster@ames.arc.nasa.gov (Hugh LaMaster) (03/23/89)
In article <16080@cup.portal.com> bcase@cup.portal.com (Brian bcase Case) writes: >> Note that if you have a superscalar architecture, and can do two inst. >>in parallel (see the Intel 80960CA paper in newest Compcon proceedings), you > >Exactly my point about superscalar. But note that for the expense of the For quite a while, I have heard superscalar used, and I think the term was defined in a paper in IEEE Computer a while back, but I am still a little fuzzy on it. Is "superscalar" an exact concept, or is it a buzzword like "RISC"? Is a Multiflw machine a superscalar machine, or the i860, or the Weitek XL-8064? Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
cliff@ficc.uu.net (cliff click) (03/23/89)
[This is my first posting, so please go easy on the flames! :-)] > For quite a while, I have heard superscalar used, and I think the term was > defined in a paper in IEEE Computer a while back, but I am still a little > fuzzy on it. Is "superscalar" an exact concept, or is it a buzzword like > "RISC"? Is a Multiflw machine a superscalar machine, or the i860, or > the Weitek XL-8064? The NOVIX 4000 chip has 2 stacks in external memory; all operations implicitly used the top-of-stack (or next-of-stack) with no real registers. Calls/branchs take 1 cycle, subroutine returns take ZERO cycles, memory loads and stores take 2 cycles (1 for instruction, 1 for mem ref), arithmetic took 1 cycle. The arithmetic instructions were explicitly bit-coded - bits in the instruction ran directly into the ALU/stacks - so a single instruction could choose one of (add/sub/neg/xor/or/and/...) and (shift left/shift right/rotate...) and (push results/pop results/...). The chip had no pipeline and no cache. Interrupts had a 2 cycle latency (recognize interrupt, push PC). The chip was running at 10Mhz with nearly 10MIPS (no flames please) throughput. Oh yeah, both stacks and main memory all had SEPERATE address and data lines, and it was a 16bit chip (2 x 8 stack address, 16 main address, 3 x 16 data lines = 80 address+data lines). Is this (basically) register-less chip RISC? Is this (basically) instructions-as-horizontal-microcode "Superscaler"? Why isn't this approach more popular? With no pipeline and no cache context switches should be cheap (stacks swapped by using MMU). The builder didn't use any fancy technology for the part - the MUCH smaller processes used by the "big boys" (Motorola, Intel, HP...) should be able to double or triple the clock rate on the part. (All of the NOVIX stuff is from my head and is a couple of years old, I may have forgotten some of it! I know that a 32bit part is in the works.) Cliff Click, Xenix Support, Ferranti International Controls Corporation. uunet.uu.net!ficc!cliff, cliff@ficc.uu.net, +1 713 274 5368. Disclaimer: What's a disclaimer?
bcase@cup.portal.com (Brian bcase Case) (03/24/89)
>For quite a while, I have heard superscalar used, and I think the term was >defined in a paper in IEEE Computer a while back, but I am still a little >fuzzy on it. Is "superscalar" an exact concept, or is it a buzzword like >"RISC"? Is a Multiflw machine a superscalar machine, or the i860, or >the Weitek XL-8064? Well, this is a good question. Since I have been using the "buzz word" superscalar, maybe I should give the definition I use. To me, superscalar is simply an implementation that executes multiple instructions per cycle (at least for a RISC architecture) when dependencies permit. It accomplishes this multiple-instruction-per-cycle rate *without any help from the instruction stream itself.* That is, take an instruction stream that executes just fine on the 29000; if the same instruction stream were presented to the S-29000, the superscalar 29000, more than one instruction would be executed per cycle when dependencies permit. To accomplish this to any reasonable degree, two or more (nearly?) identical pipelines must be present (I think). Note that this is significantly different from VLIW or the i860. For these implementations, multiple operations can execute in one cycle, but that is because the instruction says, in an explicit way, to do so. Said another way, these machines will not execute multiple operations per cycle *unless* the instruction stream says to do so. A superscalar machine needs no such help. *However*, to squeeze the most from a superscalar design, one would like to have the compiler arrange things so that dependencies are minimized. *But note*, even the compiler-arranged instruction stream will still execute just fine on a non-superscalar implementation of the archticture.
mbutts@mntgfx.mentor.com (Mike Butts) (03/28/89)
From article <22975@ames.arc.nasa.gov>, by lamaster@ames.arc.nasa.gov (Hugh LaMaster): > For quite a while, I have heard superscalar used, and I think the term was > defined in a paper in IEEE Computer a while back, but I am still a little > fuzzy on it. Is "superscalar" an exact concept, or is it a buzzword like > "RISC"? Is a Multiflw machine a superscalar machine, or the i860, or > the Weitek XL-8064? In "Superscalar vs. Superpipelined Machines" (Comp. Arch. News, ACM SIGARCH, v.16, #3, June 1988, p. 71-80), Norman P. Jouppi of DEC West in Palo Alto offers this definition: "A superscalar machine of degree n can issue n instructions per cycle." VLIW machines are similar. It's a very interesting paper, which I recommend to anyone interested in this subject. He discusses superpipelined machines, which "can issue only one instruction per cycle, but have cycle times shorter than the time required for any operation", compares the alternatives, and discusses limits to instruction-level parallelism. In particular, let me quote from his concluding comments: "The most important point to emphasize is that significant improvements in uniprocessor performance via internally parallel processors will only occur for applications with large amounts of instruction-level parallelism. These are applications that are often of more importance to physical scientists than to computer scientists. Many applications of importance to computer scientists and computer engineers, such as compilers, operating systems, and programs involving manipulation of linked data structures, will not benefit from highly parallel uniprocessors." -- Mike Butts, Research Engineer KC7IT 503-626-1302 Mentor Graphics Corp., 8500 SW Creekside Place, Beaverton OR 97005 ...!{sequent,tessi,apollo}!mntgfx!mbutts OR mbutts@pdx.MENTOR.COM These are my opinions, & not necessarily those of Mentor Graphics.