[comp.arch] HW sqrt/div

colwell@mfci.UUCP (Robert Colwell) (11/03/88)

In article <19811@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes:
>Square root is the same category as divide. Hardware is slow, so algorithms
>tend to avoid them. The reason is fundamental. The hardware is slow, and it
>is exceedingly difficult to make it faster. Strangly enough, floating point
>divide can be made to run much faster, because of its normalized operands.

One of the biggest problems with hardware sqrt/divide is that their 
hardware implementations want to be iterative, which makes these
ops non-pipeline-able.  That's a very bad feature in machines where
all other arithmetic ops, esp. flt. pt. multiply/adds are pipelined.
A software implementation of sqrt or div uses the pipelined ops, so
the net effect is that the latency of a single op will be higher,
but the net throughput is much better.  Of course, the hardware can
get you the last bit correctly rounded to IEEE specifications; the
software could too, in principle, but I've not yet seen anyone do
it.

Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090