[comp.sys.sun] SPARC divide - really really slow!

djones@awesome.Berkeley.EDU (12/27/89)

I was faced with a program which ran as fast on a SUN 3/60 as it did on a
SUN 4/280, when there should a factor of 2-3 difference if you believe the
MIPS rating.

Using profiling "cc -pg", it became evident that the source is the SPARC
divide instruction -- I gather there is none.  This is, of course, part of
the RISC strategy.  I'm still just a bit surprised that SUN/SPARC hasn't
figured out a way to get integer divisions done a little faster on a SUN
4/280 than on a SUN 3/60!

I was amused to see some of the "functions" that gprof found using up all
my CPU time.  I gather the code checks to see if the numbers are
"not_really_big", or "not_too_big" to do the division (ahem) faster.

So are we stuck with this poor multiply/divide performance in SPARC, or is
this shortcoming being addressed?  Heck, would it be faster to hand off
these operations to the Floating Point chip?

   %  cumulative    self              self    total          
 time   seconds   seconds    calls  ms/call  ms/call name    
 13.9     106.87    36.19                            divloop [4]
 13.8     142.71    35.84                            divloop [5]
  3.3     162.28     8.69                            divide [10]
  3.3     170.84     8.56                            not_really_big [11]
  3.2     179.13     8.29                            divide [12]
  3.1     187.27     8.14                            not_really_big [13]
  3.0     203.11     7.71                            end_regular_divide [15]
  2.9     210.67     7.56                            end_regular_divide [16]
  2.5     223.95     6.50  9326374     0.00     0.00  .rem [18]
  2.3     229.85     5.91  9326374     0.00     0.00  .div [20]
  1.6     239.22     4.27                            got_result [23]
  1.4     242.88     3.66                            got_result [24]
  0.6     248.88     1.69                            do_regular_divide [25]
  0.6     250.43     1.55                            do_regular_divide [26]
  0.5     251.65     1.22                            end_single_divloop [27]
  0.5     254.04     1.19                            end_single_divloop [29]
  0.2     256.09     0.62        4   155.02   155.02  .urem [33]
  0.1     257.94     0.38                            do_single_div [36]
  0.1     258.32     0.38                            do_single_div [37]
  0.1     259.03     0.36        5    71.01    71.01  .udiv [39]
  0.1     259.38     0.35                            not_too_big [40]
  0.1     259.64     0.27                            not_too_big [41]
  0.1     260.28     0.17                            single_divloop [45]
  0.0     260.35     0.07                            single_divloop [48]
  0.0     260.51     0.01                            zero_divide [55]

sritacco@hpdml93.hp.com (Steve Ritacco) (01/11/90)

Well,  This really isn't the RISC strategy.  Check into some other RISC
chips and you will see better multiply and divide performance.  I think
the R2000/R2000 takes 12 cycles for multiply and 30 cycles for divide.
The multiply divide unit is seperate from the rest of the ALU so you are
also allowed to perform other operations while the multiply or divide are
going on.