[comp.arch] VAX EDIV remainder

torek@elf.ee.lbl.gov (Chris Torek) (05/27/91)

In article <25874@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
[to do `z = x % y; /* z gets the remainder of x divided by y */' properly
 you need, assuming x in r6, y in r7, and z in r11:]

>	MOVL R6,R1		; construct the sign-extended 64-bit ...
>	ASHQ #-32,R0,R0		; dividend in the register pair <R0,R1>
>	EDIV r7,r0,r2,r11

[which, as Clark Coleman pointed out in a followup article that
I seem to have lost---the original article was
<1991May21.191034.25980@murdoch.acc.Virginia.EDU> by
clc5q@hemlock.cs.Virginia.EDU (Clark L. Coleman)---should actually be

	movl	r6, r1
	ashl	#-32, r1, r0
	ediv	r7, r0, r2, r11

or perhaps, but this is slower, `movl r6, r0; ashq #-32, r0, r0']

>You might like to time THAT sequence, and rethink your post.  Or you
>could take my word for it, that when you include the cost of having
>to reserve and target into an even-odd register pair, the EDIV is
>almost always slower.

(Not even-odd, just register pair.)

According to my `VAX instruction timings (with FPA)', the original sequence

	divl3	r7, r6, r0	# r0 = x / y ...
	mull2	r7, r0		# ... * y
	subl3	r0, r6, r11	# z = x - r0

will take:

  VAX-11/780 vs. VAX-11/750 vs. VAX-11/730 WITH FPA
  INSTRUCTION                         <EXECUTION TIME MICROSECS> <TIMES 780>
				          780     750     730    750     730

DIVL3 Reg, Reg, Reg                       9.64    8.88   16.15  1.086   0.597
MULL2 Reg, Reg                            1.85    5.68   12.05  0.326   0.154
ADDL3 Reg, Reg, Reg                       0.60    1.29    2.83  0.465   0.212
					 -----	 -----	 -----
					 12.09	 15.85   31.03

while the EDIV sequence will take:

MOVL Reg, Reg                             0.40    0.93    1.69  0.430   0.237
ASHL #10, Reg, Reg                        2.00    4.03   11.33  0.496   0.177
EDIV Reg, Reg, Reg, Reg                  11.86   11.86  100.29  1.000   0.118
					 -----	 -----	------
					 14.26	 16.82	113.31

(I have assumed a barrel shifter here.)

In other words, on the 750 it is almost a wash (about 5% faster to
avoid ediv; this could easily be lost in testing---it is hard to get
accurate timings as things depend on, e.g., alignment), while on the
780 avoiding ediv is about 15% faster and on the 730, over 70% faster.

I do not have tables for anything but obsolete VAXen, and the ones I
have came with this disclaimer:

`The following VAX instruction timings were obtained from a former
 DEC employee.  I cannot vouch for their accuracy and have no idea
 how they were obtained.'

You should use your own judgement before charging off with this as
`the answer'.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

clc5q@hemlock.cs.Virginia.EDU (Clark L. Coleman) (05/29/91)

In article <13587@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
>
>According to my `VAX instruction timings (with FPA)', the original sequence

   [timing estimates deleted]

Actually, the timing disadvantage for EDIV is worse than the estimates. My
timings were based on a version of the assembly code that did not do the
longword-swap on the two halves of the quadword dividend (it used R0/R1
instead of R1/R0 pair, you might say) and it was producing overflow, which
is faster than producing the right answer. Hence, the "cc" compiler is
quite justified in NOT using the EDIV instruction.

Unfortunately, as is often the case, a point was lost through use of a very
bad example.

-----------------------------------------------------------------------------
"The use of COBOL cripples the mind; its teaching should, therefore, be 
regarded as a criminal offence." E.W.Dijkstra, 18th June 1975.
|||  clc5q@virginia.edu (Clark L. Coleman)