tac@cs.brown.edu (Theodore A. Camus) (05/25/90)
> Whenever two integer variables are multiplied together, the assembly > code generated by the C compiler (both Sun's and gcc) generate the line: > > call .mul, 2 > > rather than use the integer multiply operation. Why is that? Floating > point multiplies generate an fmuld assembler instruction. What's going on? Simple - in keeping with the RISC philosophy, there is no integer multiply instruction on the SPARC ALU. The ".mul" routine computes multiplication literally by shifts and adds, just like we learned in 3rd grade. There is actually a "mulscc" instruction which does a shift and add in a single cycle. (There is also a .umul for unsigned multiplication, as well as .div, .udiv, .rem, and .urem.) - Ted CSnet: tac@cs.brown.edu Ted Camus ARPAnet: tac%cs.brown.edu@relay.cs.net Box 1910 CS Dept BITnet: tac@browncs.BITNET Brown University "An ounce of example is worth a pound of theory." Providence, RI 02912
dupuy@cs.columbia.edu (05/25/90)
There is no multiply instruction on a SPARC - it's a RISC architecture. Instead of providing multiply in hardware, which would take up chip area that could better be used for registers or other things that "make it go fast", a multiply step (MULScc) operation is provided, and .mul is a routine with a lot of mulscc instructions and a little logic at the beginning and end to deal with overflow, etc. Since it's a RISC machine, mulscc's execute fast, and there's an overall performance win. You could inline .mul, but the call overhead is minimal (two jumps) since it doesn't save or restore any registers. See Appendix E of the SPARC Architecture Manual, pp. 165-181 for multiplication and division routines. @alex
guy@uunet.uu.net (Guy Harris) (05/27/90)
> The ".mul" routine computes multiplication literally by shifts and adds, > just like we learned in 3rd grade. Actually, the multiply *instruction* on a lot of machines does the same (for unsigned multiply; the Booth algorithm is probably used for signed multiplies). Others may have parallel multipliers, and do it quicker. >There is no multiply instruction on a SPARC - it's a RISC architecture. Actually, there exist RISC architectures (or, shall we say, "architectures generally considered RISC architectures"; for *any* given architecture like that, you'll probably find *somebody* who thinks it's not RISC, but so it goes) that *do* have multiply/divide instructions (MIPS, 88K) and RISC architectures that don't (SPARC; HP-PA and IBM's ROMP, I think; dunno about IBM's America). I seem to remember somebody from TI saying in "comp.arch" that at some point future SPARC implementations would have multiply-divide instructions. Versions of the C shared library might be provided for both machines with those instructions, in which case the multiply/divide routines would use them, and machines without them, in which case the multiply/divide routines would work as they do now. This lets old binaries use the instructions, albeit with the penalty of a subroutine call. There might also be compiler options specifying whether to use the new instructions directly; either this option would cause the resulting programs to run only on newer machines, or future SPARC software would catch the illegal-instruction trap that those instructions would cause, and emulate them in software (meaning that the resulting programs would also run on older machines, as long as they had newer software). (No, I don't know what Sun, or SPARC International, or whoever, plan to do. These are just some of the options. There may be others, such as having the multiply/divide routines replace their call point with code to do the multiply/divide in line, if possible - yes, I think this is possible under SunOS 4.x and S5R4; it'd just make the page containing the code writable, and modify it, causing a copy-on-write - or having "ld.so" or its S5R4 equivalent zipping through the code and doing the replacement.) >Instead of providing multiply in hardware, which would take up chip area >that could better be used for registers or other things that "make it go >fast", a multiply step (MULScc) operation is provided, and .mul is a >routine with a lot of mulscc instructions and a little logic at the >beginning and end to deal with overflow, etc. Since it's a RISC machine, >mulscc's execute fast, and there's an overall performance win. Although there are applications that *do* lose (which was brought up in the "comp.arch" thread in which the TI person spoke), which may be why multiply/divide instructions may appear in future SPARC implementations.