[comp.arch] integer multiplies

lgy@pupthy2.PRINCETON.EDU (Larry Yaffe) (05/24/88)

In article <2231@gumby.mips.COM> earl@mips.COM (Earl Killian) writes:

    [[ Much information about the R/3000 including: ]]

>> Integer Multiply/Divide
>
>How is multiply is implemented [software, multiply step, hardware]? hardware
>How many cycles to perform 32x32->32 multiply? 13
						^^
>
>> Floating Point
>
>What are the floating point operation latency/issue/repeats?
>
>		 32-bit		 64-bit		 80-bit
>	mul	 4/ 1/ 4	 5/ 1/ 5	 n.a.
		^^
>UUCP: {ames,decwrl,prls,pyramid}!mips!earl
>USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086

    I'm curious and a little surprised about the relative speed of
integer and floating point multiplies.  Are integer multiplies
substantially less frequent than floating point mults (in some
selection of "real" programs) that integer multiply being three
times slower than f.p. is not a significant bottleneck?
What fraction of integer multiplies are not multiplications
by small compile-time constants (in that same set of programs)?
Since MIPS Co. is reputed to have studied real programs when
considering design trade-off's, I'm hoping someone there can
offer some insight.

    Also, despite having looked through the MIPS architecture book
quite carefully, I can not find any mention of the latency of integer
multiply or divide operations.  Am I just getting sloppy, or is the
information really missing?

------------------------------------------------------------------------
Laurence G. Yaffe			Internet: lgy@pupthy.princeton.edu
Department of Physics			Bitnet:   lgy@pucc
Princeton University			UUCP:     ...!princeton!pupthy!lgy
PO Box 708, Princeton NJ 08544

earl@mips.COM (Earl Killian) (05/25/88)

In article <3006@phoenix.Princeton.EDU> lgy@pupthy2.PRINCETON.EDU (Larry Yaffe) writes:

       I'm curious and a little surprised about the relative speed of
   integer and floating point multiplies.  Are integer multiplies
   substantially less frequent than floating point mults (in some
   selection of "real" programs) that integer multiply being three
   times slower than f.p. is not a significant bottleneck?
   What fraction of integer multiplies are not multiplications
   by small compile-time constants (in that same set of programs)?
   Since MIPS Co. is reputed to have studied real programs when
   considering design trade-off's, I'm hoping someone there can
   offer some insight.

There are several answers to this.  One would be some statistics to
show that integer multiplies are less common than floating point ones.
They indeed are in most applications.  DSP often uses integer
multiply, but given relative integer/fp speeds in the r3000, you'd be
best off changing it to fp!

Another answer is a practical one.  The cpu chip and fpu chip were
designed serially rather than in parallel (MIPS was a small company in
those days).  There just wasn't time to develop that kind of
multiplier technology during the cpu chip design (those are pretty
fast fp times).  This is not to say that the integer multiply is bad
(or a "bottleneck") -- it's several times faster than most micros.  Of
course, now that we've developed the technology for the fp, it would
not be unreasonable to apply it to a future cpu chip.
-- 
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086

mash@mips.COM (John Mashey) (05/25/88)

In article <2241@gumby.mips.COM> earl@mips.COM (Earl Killian) writes:
>In article <3006@phoenix.Princeton.EDU> lgy@pupthy2.PRINCETON.EDU (Larry Yaffe) writes:

>       I'm curious and a little surprised about the relative speed of
>   integer and floating point multiplies....

Here are the dynamic frequencies of integer and FP multiplies, as a percentage
of instructions (NOT instruction cycles, which is much more complex to 
describe, given instruction overlap), in a quick sample of programs.
No science presumed: I just grabbed a few programs believed to be OK:

Program		Integer		FP
as1		 .02%		 0
ccom		 .08%		 0
espresso	<.01%		<.01%
gnuchess	 .06%		0
hspice		 .02%		2.4%
linpackd*	 .13%		8.2%
nroff		 .45%		0
whetd*		1.42%		7.38%
timberwolf	 .31%		.45%

As a gross cycle estimate, multiply the Integer # by 10, and the FP one by 5.
(* = benchmark, above)

Conclusions:
1) If a program uses FP in any serious fashion,  FP mult > integer mult.

2) A few programs (nroff, whetd, and timberwolf) use integer mult enough
that it would be nice to be faster, i.e., a 10-cycle multiply causes
>2% of the instruction cycles.

3) Many programs don't execute integer mult enough to be noticable.

Of course, our compilers do strength reduction, common subexpression
elimination, etc, and convert mulitplies by constants into shifts/adds/subs;
i.e., they've squished out a lot of actual multiplies.

This is perfect example where your choice of benchmarks tells
completely different answers to the question: "should we have an integer
mult instruction, and if so, how much silicon are we willing to devote to
making it X cycles."
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086