[comp.arch] integer multiplies on a Sparc

scottb@ogicse.cse.ogi.edu (Scott Baker) (02/02/91)

I am planning to implement a neural-net simulator using integer
arithmetic.  One of the motivations of integer vs. floating point is
higher speed for an integer implementation.  However, I have just
been told that a Sparc 1 may actually be -slower- at integer multiplies
than floating-point multiplies because of a lack of hardware support
for integer multiplies.  Is this true?! 

	Thanks:

	Scott Baker
	scottb@ogicse.cse.ogi.edu

henry@zoo.toronto.edu (Henry Spencer) (02/02/91)

In article <16864@ogicse.ogi.edu> scottb@ogicse.cse.ogi.edu (Scott Baker) writes:
>been told that a Sparc 1 may actually be -slower- at integer multiplies
>than floating-point multiplies because of a lack of hardware support
>for integer multiplies.  Is this true?! 

Yes.  The original Sparcs are somewhat unbalanced machines.  I don't think
it is definitively a mistake not to have a fast general integer multiply
(since an awful lot of integer multiplies are by small constants, which
can be done better by shift-add sequences), but it is a mistake to put
a lot of effort into fast floating-point and none into fast integer
multiply.  There are signs that the Sparc world now realizes this, but
it comes too late to help a lot of the early machines.

Buy MIPS. :-)
-- 
"Maybe we should tell the truth?"      | Henry Spencer at U of Toronto Zoology
"Surely we aren't that desperate yet." |  henry@zoo.toronto.edu   utzoo!henry

shand@prl.dec.com (Mark Shand) (02/13/91)

Integer multiply on SPARC is indeed poor.  I recently added
an assembler kernel for SPARC to our bignum package and found
the fastest way to do multiprecision integer multiply was
through the FPU.  The primitive I use is 32bitx16bit->48bit which
can be computed exactly in double precision.  I've only timed it
on a SPARCstation 1 which has a rather slow 9 cycle DP mult.
The overall performance for multiprecision integer multiplies
is about 4 times less than a MIPS R2000
which has a 12-16 (depending how you count) cycle 32x32->64
integer mult, but is still faster than any other way of doing
full-word integer mult on an early SPARC.

(our bignum package is available by mail from librarian@prl.dec.com,
we will be announcing an FTP server soon)

Even on a more balanced machine like the MIPS R2000,R3000 floating
mult, although more resource intensive than integer mult, is a
higher priority operation and, through the devotion of more hardware,
takes fewer cycles.

Moral: tradeoffs between integer vs float are subtle, just because
an operation CAN be implemented more efficiently doesn't mean it
HAS BEEN.

Of course next year's CPU designers will benchmark your neural net code
that you've finally decided to cast in floats even though ints would
have served you equally well, and those designers will deprecate
integer multiply even further.

Questions:

Does anyone know which SPARC implementations include integer multiply
support beyond the multiply step instruction?  What is the opcode?
What happens if an early SPARC hits such an opcode?  Have these SPARC
implementations found their way into any product machines yet?

Another thing that bugged me about multiply step was that it doesn't
seem to give any way to get the high order part of the result.
MIPS on the contrary gives you lo and hi result registers.  This
is essential in multiprecision work.  Am I missing something in
multiply step?  Do the newer instruction help here?

Mark Shand.