[net.arch] VAX polyd instruction

reiter@harvard.UUCP (Ehud Reiter) (03/05/86)

In the various RISC vs CISC debates, the VAX POLYD instruction has often
been pointed out as the "archtype" of a bad instruction - inefficient and
difficult for compilers to handle.  Now, in article <78@cad.UUCP>, Richard
Rudell (rudell@cad.UUCP) points out that "common knowledge" to the contrary,
his tests show that the VAX POLYD instruction is faster even than unrolled
assembly language.  The point I wish to make is that it is irrelevant that
the POLYD instruction is difficult for compilers to handle, because its
main purpose was to speed up evaluation of mathematical functions (SIN, EXP,
LOG, etc.), which nearly always come down to evaluating an approximation
polynomial (see H. Levy and R. Eckhouse, COMPUTER PROGRAMMING AND ARCHITECTURE:
THE VAX-11 (Digital Press), pg 167).  Therefore, the intended "user" of the
POLYD instruction was the run time library system, not compiled code.

Of course, one can argue that the best way to speed up trig functions is not
with a polynomial evaluation instruction, but rather by hard coding the trig
functions directly in the floating point unit (some of the new floating
point chips, like the 80287, seem to be moving in this direction), but if RISC
types get upset at POLYD, they must really hate the thought of primitive
instructions for SIN, ATAN, etc.

					Ehud Reiter
					reiter@harvard.ARPA
					...seismo!harvard!reiter.UUCP

jlg@lanl.ARPA (Jim Giles) (03/06/86)

In article <759@harvard.UUCP> reiter@harvard.UUCP (Ehud reiter) writes:
>Of course, one can argue that the best way to speed up trig functions is not
>with a polynomial evaluation instruction, but rather by hard coding the trig
>functions directly in the floating point unit (some of the new floating
>point chips, like the 80287, seem to be moving in this direction), but if RISC
>types get upset at POLYD, they must really hate the thought of primitive
>instructions for SIN, ATAN, etc.

Quite the contrary.  The evaluation of these primitives in hardware is not
necessarily a bad idea.  As long as it's done in a seperate functional
unit (and the archecture is pipelined) it is not a bad idea at all.  The
problem with POLYD is not what it does, but that it slows the rest of the
machine's instructions down even if it's not used (which it often isn't).

Of course (as you can see), I'm not a RISC purist.  I don't think single
clock execution of ALL instructions is manditory.  Pipelining complicates
archecture because (among other things) it means that you have to provide
for instruction delays because of reserved registers.  I think this is
more than offset by the speed advantage of having several functional units
which can each be simple because they are only responsible for part of
the instruction set.  The most useful thing about RISC ideas is the
reduction of addressing modes.  The CRAY has the only two I've ever really
needed: immediate and indexed (I COULD get by without immediate, but it's
cheap to provide - apparently).

J. Giles
Los Alamos

aglew@ccvaxa.UUCP (03/09/86)

This RISC type is not upset at implementing SIN as a "primitive"
instruction. True, it pretty much has to be microcoded, but it is
an example of a good microcoded instruction - one that will cook
around in your floating point unit for a long time, and not require
lots of memory accesses (see my earlier note about RISCs and coprocessors).

But let me qualify this acceptance: (1) instruction SIN must be faster than
I could do it in software. It must handle the special cases as well and as
fast as I can do (there are different algorithms for different values 
of SIN). (2) it must be a reasonably good SIN, so that the library functions
that use it don't have to go through contortions to determine if it's going
to provide an accurate result or not. Most importantly, (3) there should not
be a more important potential use of the circuitry and delays that SIN will
add to the chip. Eg. if SIN adds one level of logic, slowing down all 
instructions in the pipeline by, say, 10%, and SINs take up less than 10%
of execution time on your machine without the SIN, then throw the SIN out!
This type of judgement can only be done with statistics and knowledge of how
your users use your machine.

Let me qualify that: if SIN takes up less than 10% of the time, but the
applications that use SIN are the applications and benchmarks that most 
influence your customers to buy your machine, fine, put SIN in. You don't 
build fast computers for the sake of building them, you build them to sell
them. But watch out!: that comes close to selling your customer short, so
if he finally determines that what he wanted was fast computers, not just
fast benchmarks, he may not come back to you for his next machine.

stevev@tekchips.UUCP (03/12/86)

> Quite the contrary.  The evaluation of these primitives in hardware is not
> necessarily a bad idea.  As long as it's done in a seperate functional
> unit (and the archecture is pipelined) it is not a bad idea at all.

One thing that has puzzled me about RISC machines is that its proponents
argue that only a very basic machine should be on the main processor chip,
with everything else done in separate functional units.  Sounds fine so far.
Then out of Berkeley comes the SOAR (Smalltalk on a RISC) machine, into
which there is hard-wired support for a very specific language--Smalltalk.
From what I hear from RISC proponents, the `proper' way to have done
this would have been to use a vanilla RISC machine and then to put the
Smalltalk support on a separate chip.

If a Smalltalk-specific RISC machine is a good idea, why not a LISP-specific
RISC machine, a Prolog-specific RISC machine, and a Pascal-specific RISC
machine?  I thought that language-specific architectures were one of the
things that RISC-types say are a bad idea.

		Steve Vegdahl
		Computer Research Lab.
		Tektronix, Inc.
		Beaverton, Oregon

thomas@utah-gr.UUCP (Spencer W. Thomas) (03/13/86)

In article <5100026@ccvaxa> aglew@ccvaxa.UUCP writes:
>
>But let me qualify this acceptance: (1) instruction SIN must be faster than
>I could do it in software. It must handle the special cases as well and as
>fast as I can do (there are different algorithms for different values 
>of SIN). 
Interesting note:  I was sitting in on an architecture course this
quarter (which, of course, automatically qualifies me to post to this
group :-).  We were discussing (I think) the Symbolics Lisp Machine
(3600).  It has a floating point accelerator you can buy.  If you don't
have it, floating point is done in software, of course.  Well, it turns
out that even if you do have it, the software (microcode?) floating
point routine is run in parallel.  Whichever one finishes first "wins".
Now, I can hear you ask "when would software EVER be faster than the
FPA".  The software has code to take care of some easy special cases
(e.g., multiplication by zero).  For these cases, it will finish before
the FPA, because it just grinds through the bits, no matter what the
input values.

-- 
=Spencer   ({ihnp4,decvax}!utah-cs!thomas, thomas@utah-cs.ARPA)

peters@cubsvax.UUCP (Peter S. Shenkin) (03/13/86)

In article <ccvaxa.5100026> aglew@ccvaxa.UUCP writes:
>Let me qualify that: if SIN takes up less than 10% of the time, but the
>applications that use SIN are the applications and benchmarks that most 
>influence your customers to buy your machine, fine, put SIN in....

To paraphrase J. Robert Oppenheimer, I guess now the computer designers have
tasted SIN....

(Sorry, couln't resist....)

Peter S. Shenkin	 Columbia Univ. Biology Dept., NY, NY  10027
{philabs,rna}!cubsvax!peters		cubsvax!peters@columbia.ARPA

grr@cbm.UUCP (George Robbins) (03/16/86)

A point that people seem to miss is that an instruction like polyd can
be optimized on different models of the CPU such that you tune the design
for different applications/performance levels.  It's much harder to make
changes that will make 37 arbitrary software polynomial evaluation routines
all run faster.

Anyhow, why beat on the poor little vaxen?  They are but pale shadows compared
to a real *C*ISC like a Burroughs B6700.  Kind of interesting when you review
the old claims about Burroughs architecture being designed for HLL's - what
percentage of the instructions and whatnot did their compiler writers manage
to use?
-- 
George Robbins - now working with,	uucp: {ihnp4|seismo|caip}!cbm!grr
but no way officially representing	arpa: cbm!grr@seismo.css.GOV
Commodore, Engineering Department	fone: 215-431-9255 (only by moonlite)

johnson@uiucdcsp.CS.UIUC.EDU (03/19/86)

/* Written 12:06 pm  Mar 12, 1986 by stevev@tekchips.UUCP in net.arch */
>> Quite the contrary.  The evaluation of these primitives in hardware is not
>> necessarily a bad idea.  As long as it's done in a seperate functional
>> unit (and the archecture is pipelined) it is not a bad idea at all.

>One thing that has puzzled me about RISC machines is that its proponents
>argue that only a very basic machine should be on the main processor chip,
>with everything else done in separate functional units.

"Separate functional unit" does not mean "separate chip", but a separate
part of the chip devoted to a particular function.  SOAR is pretty much
a standard RISC.  Its special features would be useful for LISP as well as
Smalltalk.  There are essentially two new features.  The first is that a
check is made on each store that a particular memory-management invarient is
being maintained.  If it is not, the processor traps to a routine that fixes
it.  The second is that arithmetic routines check to be sure that the
arguments are small integers.  If not, the routines trap to the more general
solutions in subroutines.

There are groups working on RISCs for LISP and PROLOG.  RISC proponents
don't claim that special purpose processor design is dead, just that
special purpose instructions are dead.  

robison@uiucdcsb.CS.UIUC.EDU (03/20/86)

> Anyhow, why beat on the poor little vaxen?  They are but pale shadows compared
> to a real *C*ISC like a Burroughs B6700.  Kind of interesting when you review 
> the old claims about Burroughs architecture being designed for HLL's - what
> percentage of the instructions and whatnot did their compiler writers manage
> to use?

The Burroughs B6700 and its successors (B7700 and current A15) have a aspects
of both CISCS and RISCS.  There are relatively few instructions, but each
instruction has complex semantics.  For example, there are only two
non-immediate load instructions: "value call" and "name call", which
have automatic chain dereferencing and "thunk" evaluation.
For the code I've looked at, good use was made of most of the instruction set
by the ALGOL compiler.  (This is to be expected, since Burroughs ALGOL
has extensions based on the instruction set.  E.g. field extractions.)

The marketing advantage of the complex semantics is that it allows
a wide price range of machines with the same instruction set.
Because the instructions try to describe what to do, and not exactly
how to do it, the higher-priced machines can exploit more parallelism
by rearranging the computations at run-time (and in some cases not doing them!),
which compilers can not do.

The principle problem is other languages which the designers did not
anticipate.  E.g. there is no C compiler available because the current
hardware can not support C's pointers.

Arch D. Robison
University of Illinois