[comp.lang.fortran] Why floating point hardware: micro-parallelism, micro-cycles

dgh@validgh.com (David G. Hough on validgh) (09/09/90)

Stephen Spackman's recent postings recall a question I asked during
the development of the SPARC instruction-set architecture at Sun: since
floating-point instructions can be decomposed into simple integer operations,
how can they be justified in a RISC architecture?  Why is it that they don't
run as fast in software?  (They don't, and can't, but you might have to
try it to convince yourself.  All you need to do is look at 64-bit double
precision floating-point add/subtract on a 32-bit RISC architecture).

Basically I was attacking the idea that RISC = 'a few simple instructions'.
This was an overly simple definition anyway.  The correct definition of RISC
architecture is 'good engineering' in the sense of 'good engineering
economy', although not everybody has realized this yet.

The underlying answer to the floating-point question is that while 
software floating point is limited by the macro-instruction cycle
time and parallelism is limited by the macro-instruction parallelism
potential, a hardware floating-point implementation can run at a
faster clock and have entirely different kinds of parallelism.  For
instance, one of the Hot Chips Symposium papers this year mentioned
a floating-point addition unit that simultaneously does various
cases that arises and picks the correct one at the end.  And I mean
really simultaneously.   Although high performance hardware floating 
point is not microcoded in the usual sense it is often implemented in 
hard-wired micro steps whose clock rate isn't limited by the 
instruction fetch bandwidth as is the macro-clock cycle rate.

What tells you which complex multi-macro-cycle instructions (like floating-point
ops) are appropriate for inclusion in an instruction-set architecture?
One issue that arises if you want to be commercially successful
is that it's not a good idea to completely overlook any major application
area even if it's less than 1% of some "total", especially if your 
competitors didn't overlook it.  Thus MIPS put in integer multiplication
before SPARC and SPARC put in floating-point sqrt before MIPS.  Both
oversights were remedied in the second-generation instruction set
architectures, although I think MIPS has already implemented sqrt 
while no SPARC vendor has implemented integer multiplication.
Thus people like Silverman point out correctly
that current SPARC implementations
aren't competitive for his kinds of problems; this is embarrassing,
and I used to worry that it would bother potential customers - most
of whom don't depend on integer multiplication but may not know it -
but it doesn't seem to be much of a problem.  Sun-3 sales are down
in the noise compared to Sun-4 even if they can do some integer
arithmetic problems faster at the same clock.

Another aspect of commercial success is
mass marketability - generalized processors may be cheaper and more
cost effective and faster than specialized ones because of higher run 
rates and more attention from vendors in getting them into the latest 
device technologies.

Spackman's speculation is that a totally different paradigm for non-integer
calculations could be more cost-effective than conventional floating point.
There are lots of candidate proposals; consult any recent proceedings from
the IEEE Computer Arithmetic Symposia.  But most of them are content to
prove feasibility rather than cost-effectiveness.

As mentioned, the issue is good engineering economy.  The quantitative
approach demonstrated in Hennessy and Patterson is the best basis to
start, but it's much more expensive than thought-experiments posted
to news:  to really test an idea you have to build a hardware simulator
and a good optimizing compiler that properly exploits it, and possibly
design some language extensions to express what you can do.   And even
that's not enough; to avoid the kinds of embarrassments mentioned above
you need to learn as much as possible about what potential customers
actually do with computers and what they would do if they could.
It's a lifetime undertaking.

Besides Patterson, I should mention that Robert Garner, George Taylor,
John Mashey, and Earl Killian have helped me sort out what RISC
is all about.
-- 

David Hough

dgh@validgh.com		uunet!validgh!dgh	na.hough@na-net.stanford.edu