reiter@harvard.UUCP (Ehud Reiter) (03/05/86)
In the various RISC vs CISC debates, the VAX POLYD instruction has often been pointed out as the "archtype" of a bad instruction - inefficient and difficult for compilers to handle. Now, in article <78@cad.UUCP>, Richard Rudell (rudell@cad.UUCP) points out that "common knowledge" to the contrary, his tests show that the VAX POLYD instruction is faster even than unrolled assembly language. The point I wish to make is that it is irrelevant that the POLYD instruction is difficult for compilers to handle, because its main purpose was to speed up evaluation of mathematical functions (SIN, EXP, LOG, etc.), which nearly always come down to evaluating an approximation polynomial (see H. Levy and R. Eckhouse, COMPUTER PROGRAMMING AND ARCHITECTURE: THE VAX-11 (Digital Press), pg 167). Therefore, the intended "user" of the POLYD instruction was the run time library system, not compiled code. Of course, one can argue that the best way to speed up trig functions is not with a polynomial evaluation instruction, but rather by hard coding the trig functions directly in the floating point unit (some of the new floating point chips, like the 80287, seem to be moving in this direction), but if RISC types get upset at POLYD, they must really hate the thought of primitive instructions for SIN, ATAN, etc. Ehud Reiter reiter@harvard.ARPA ...seismo!harvard!reiter.UUCP
jlg@lanl.ARPA (Jim Giles) (03/06/86)
In article <759@harvard.UUCP> reiter@harvard.UUCP (Ehud reiter) writes: >Of course, one can argue that the best way to speed up trig functions is not >with a polynomial evaluation instruction, but rather by hard coding the trig >functions directly in the floating point unit (some of the new floating >point chips, like the 80287, seem to be moving in this direction), but if RISC >types get upset at POLYD, they must really hate the thought of primitive >instructions for SIN, ATAN, etc. Quite the contrary. The evaluation of these primitives in hardware is not necessarily a bad idea. As long as it's done in a seperate functional unit (and the archecture is pipelined) it is not a bad idea at all. The problem with POLYD is not what it does, but that it slows the rest of the machine's instructions down even if it's not used (which it often isn't). Of course (as you can see), I'm not a RISC purist. I don't think single clock execution of ALL instructions is manditory. Pipelining complicates archecture because (among other things) it means that you have to provide for instruction delays because of reserved registers. I think this is more than offset by the speed advantage of having several functional units which can each be simple because they are only responsible for part of the instruction set. The most useful thing about RISC ideas is the reduction of addressing modes. The CRAY has the only two I've ever really needed: immediate and indexed (I COULD get by without immediate, but it's cheap to provide - apparently). J. Giles Los Alamos
aglew@ccvaxa.UUCP (03/09/86)
This RISC type is not upset at implementing SIN as a "primitive" instruction. True, it pretty much has to be microcoded, but it is an example of a good microcoded instruction - one that will cook around in your floating point unit for a long time, and not require lots of memory accesses (see my earlier note about RISCs and coprocessors). But let me qualify this acceptance: (1) instruction SIN must be faster than I could do it in software. It must handle the special cases as well and as fast as I can do (there are different algorithms for different values of SIN). (2) it must be a reasonably good SIN, so that the library functions that use it don't have to go through contortions to determine if it's going to provide an accurate result or not. Most importantly, (3) there should not be a more important potential use of the circuitry and delays that SIN will add to the chip. Eg. if SIN adds one level of logic, slowing down all instructions in the pipeline by, say, 10%, and SINs take up less than 10% of execution time on your machine without the SIN, then throw the SIN out! This type of judgement can only be done with statistics and knowledge of how your users use your machine. Let me qualify that: if SIN takes up less than 10% of the time, but the applications that use SIN are the applications and benchmarks that most influence your customers to buy your machine, fine, put SIN in. You don't build fast computers for the sake of building them, you build them to sell them. But watch out!: that comes close to selling your customer short, so if he finally determines that what he wanted was fast computers, not just fast benchmarks, he may not come back to you for his next machine.
stevev@tekchips.UUCP (03/12/86)
> Quite the contrary. The evaluation of these primitives in hardware is not > necessarily a bad idea. As long as it's done in a seperate functional > unit (and the archecture is pipelined) it is not a bad idea at all. One thing that has puzzled me about RISC machines is that its proponents argue that only a very basic machine should be on the main processor chip, with everything else done in separate functional units. Sounds fine so far. Then out of Berkeley comes the SOAR (Smalltalk on a RISC) machine, into which there is hard-wired support for a very specific language--Smalltalk. From what I hear from RISC proponents, the `proper' way to have done this would have been to use a vanilla RISC machine and then to put the Smalltalk support on a separate chip. If a Smalltalk-specific RISC machine is a good idea, why not a LISP-specific RISC machine, a Prolog-specific RISC machine, and a Pascal-specific RISC machine? I thought that language-specific architectures were one of the things that RISC-types say are a bad idea. Steve Vegdahl Computer Research Lab. Tektronix, Inc. Beaverton, Oregon
thomas@utah-gr.UUCP (Spencer W. Thomas) (03/13/86)
In article <5100026@ccvaxa> aglew@ccvaxa.UUCP writes: > >But let me qualify this acceptance: (1) instruction SIN must be faster than >I could do it in software. It must handle the special cases as well and as >fast as I can do (there are different algorithms for different values >of SIN). Interesting note: I was sitting in on an architecture course this quarter (which, of course, automatically qualifies me to post to this group :-). We were discussing (I think) the Symbolics Lisp Machine (3600). It has a floating point accelerator you can buy. If you don't have it, floating point is done in software, of course. Well, it turns out that even if you do have it, the software (microcode?) floating point routine is run in parallel. Whichever one finishes first "wins". Now, I can hear you ask "when would software EVER be faster than the FPA". The software has code to take care of some easy special cases (e.g., multiplication by zero). For these cases, it will finish before the FPA, because it just grinds through the bits, no matter what the input values. -- =Spencer ({ihnp4,decvax}!utah-cs!thomas, thomas@utah-cs.ARPA)
peters@cubsvax.UUCP (Peter S. Shenkin) (03/13/86)
In article <ccvaxa.5100026> aglew@ccvaxa.UUCP writes: >Let me qualify that: if SIN takes up less than 10% of the time, but the >applications that use SIN are the applications and benchmarks that most >influence your customers to buy your machine, fine, put SIN in.... To paraphrase J. Robert Oppenheimer, I guess now the computer designers have tasted SIN.... (Sorry, couln't resist....) Peter S. Shenkin Columbia Univ. Biology Dept., NY, NY 10027 {philabs,rna}!cubsvax!peters cubsvax!peters@columbia.ARPA
grr@cbm.UUCP (George Robbins) (03/16/86)
A point that people seem to miss is that an instruction like polyd can be optimized on different models of the CPU such that you tune the design for different applications/performance levels. It's much harder to make changes that will make 37 arbitrary software polynomial evaluation routines all run faster. Anyhow, why beat on the poor little vaxen? They are but pale shadows compared to a real *C*ISC like a Burroughs B6700. Kind of interesting when you review the old claims about Burroughs architecture being designed for HLL's - what percentage of the instructions and whatnot did their compiler writers manage to use? -- George Robbins - now working with, uucp: {ihnp4|seismo|caip}!cbm!grr but no way officially representing arpa: cbm!grr@seismo.css.GOV Commodore, Engineering Department fone: 215-431-9255 (only by moonlite)
johnson@uiucdcsp.CS.UIUC.EDU (03/19/86)
/* Written 12:06 pm Mar 12, 1986 by stevev@tekchips.UUCP in net.arch */ >> Quite the contrary. The evaluation of these primitives in hardware is not >> necessarily a bad idea. As long as it's done in a seperate functional >> unit (and the archecture is pipelined) it is not a bad idea at all. >One thing that has puzzled me about RISC machines is that its proponents >argue that only a very basic machine should be on the main processor chip, >with everything else done in separate functional units. "Separate functional unit" does not mean "separate chip", but a separate part of the chip devoted to a particular function. SOAR is pretty much a standard RISC. Its special features would be useful for LISP as well as Smalltalk. There are essentially two new features. The first is that a check is made on each store that a particular memory-management invarient is being maintained. If it is not, the processor traps to a routine that fixes it. The second is that arithmetic routines check to be sure that the arguments are small integers. If not, the routines trap to the more general solutions in subroutines. There are groups working on RISCs for LISP and PROLOG. RISC proponents don't claim that special purpose processor design is dead, just that special purpose instructions are dead.
robison@uiucdcsb.CS.UIUC.EDU (03/20/86)
> Anyhow, why beat on the poor little vaxen? They are but pale shadows compared > to a real *C*ISC like a Burroughs B6700. Kind of interesting when you review > the old claims about Burroughs architecture being designed for HLL's - what > percentage of the instructions and whatnot did their compiler writers manage > to use? The Burroughs B6700 and its successors (B7700 and current A15) have a aspects of both CISCS and RISCS. There are relatively few instructions, but each instruction has complex semantics. For example, there are only two non-immediate load instructions: "value call" and "name call", which have automatic chain dereferencing and "thunk" evaluation. For the code I've looked at, good use was made of most of the instruction set by the ALGOL compiler. (This is to be expected, since Burroughs ALGOL has extensions based on the instruction set. E.g. field extractions.) The marketing advantage of the complex semantics is that it allows a wide price range of machines with the same instruction set. Because the instructions try to describe what to do, and not exactly how to do it, the higher-priced machines can exploit more parallelism by rearranging the computations at run-time (and in some cases not doing them!), which compilers can not do. The principle problem is other languages which the designers did not anticipate. E.g. there is no C compiler available because the current hardware can not support C's pointers. Arch D. Robison University of Illinois