[comp.sys.sun] Integer math on SPARCs

tac@cs.brown.edu (Theodore A. Camus) (05/25/90)

> Whenever two integer variables are multiplied together, the assembly 
> code generated by the C compiler (both Sun's and gcc) generate the line:
>
>		call	.mul, 2
>
> rather than use the integer multiply operation. Why is that? Floating
> point multiplies generate an fmuld assembler instruction. What's going on?

Simple - in keeping with the RISC philosophy, there is no integer multiply
instruction on the SPARC ALU.  The ".mul" routine computes multiplication
literally by shifts and adds, just like we learned in 3rd grade.  There is
actually a "mulscc" instruction which does a shift and add in a single
cycle.  (There is also a .umul for unsigned multiplication, as well as
.div, .udiv, .rem, and .urem.)

- Ted

  CSnet:     tac@cs.brown.edu                          Ted Camus  
  ARPAnet:   tac%cs.brown.edu@relay.cs.net             Box 1910 CS Dept
  BITnet:    tac@browncs.BITNET                        Brown University
  "An ounce of example is worth a pound of theory."    Providence, RI 02912

dupuy@cs.columbia.edu (05/25/90)

There is no multiply instruction on a SPARC - it's a RISC architecture.
Instead of providing multiply in hardware, which would take up chip area
that could better be used for registers or other things that "make it go
fast", a multiply step (MULScc) operation is provided, and .mul is a
routine with a lot of mulscc instructions and a little logic at the
beginning and end to deal with overflow, etc.  Since it's a RISC machine,
mulscc's execute fast, and there's an overall performance win.  You could
inline .mul, but the call overhead is minimal (two jumps) since it doesn't
save or restore any registers.

See Appendix E of the SPARC Architecture Manual, pp. 165-181 for
multiplication and division routines.

@alex

guy@uunet.uu.net (Guy Harris) (05/27/90)

> The ".mul" routine computes multiplication literally by shifts and adds,
> just like we learned in 3rd grade.

Actually, the multiply *instruction* on a lot of machines does the same
(for unsigned multiply; the Booth algorithm is probably used for signed
multiplies).  Others may have parallel multipliers, and do it quicker.

>There is no multiply instruction on a SPARC - it's a RISC architecture.

Actually, there exist RISC architectures (or, shall we say, "architectures
generally considered RISC architectures"; for *any* given architecture
like that, you'll probably find *somebody* who thinks it's not RISC, but
so it goes) that *do* have multiply/divide instructions (MIPS, 88K) and
RISC architectures that don't (SPARC; HP-PA and IBM's ROMP, I think; dunno
about IBM's America).

I seem to remember somebody from TI saying in "comp.arch" that at some
point future SPARC implementations would have multiply-divide
instructions.  Versions of the C shared library might be provided for both
machines with those instructions, in which case the multiply/divide
routines would use them, and machines without them, in which case the
multiply/divide routines would work as they do now.  This lets old
binaries use the instructions, albeit with the penalty of a subroutine
call.

There might also be compiler options specifying whether to use the new
instructions directly; either this option would cause the resulting
programs to run only on newer machines, or future SPARC software would
catch the illegal-instruction trap that those instructions would cause,
and emulate them in software (meaning that the resulting programs would
also run on older machines, as long as they had newer software).

(No, I don't know what Sun, or SPARC International, or whoever, plan to
do.  These are just some of the options.  There may be others, such as
having the multiply/divide routines replace their call point with code to
do the multiply/divide in line, if possible - yes, I think this is
possible under SunOS 4.x and S5R4; it'd just make the page containing the
code writable, and modify it, causing a copy-on-write - or having "ld.so"
or its S5R4 equivalent zipping through the code and doing the
replacement.)

>Instead of providing multiply in hardware, which would take up chip area
>that could better be used for registers or other things that "make it go
>fast", a multiply step (MULScc) operation is provided, and .mul is a
>routine with a lot of mulscc instructions and a little logic at the
>beginning and end to deal with overflow, etc.  Since it's a RISC machine,
>mulscc's execute fast, and there's an overall performance win.

Although there are applications that *do* lose (which was brought up in
the "comp.arch" thread in which the TI person spoke), which may be why
multiply/divide instructions may appear in future SPARC implementations.