torek@elf.ee.lbl.gov (Chris Torek) (05/27/91)
In article <25874@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: [to do `z = x % y; /* z gets the remainder of x divided by y */' properly you need, assuming x in r6, y in r7, and z in r11:] > MOVL R6,R1 ; construct the sign-extended 64-bit ... > ASHQ #-32,R0,R0 ; dividend in the register pair <R0,R1> > EDIV r7,r0,r2,r11 [which, as Clark Coleman pointed out in a followup article that I seem to have lost---the original article was <1991May21.191034.25980@murdoch.acc.Virginia.EDU> by clc5q@hemlock.cs.Virginia.EDU (Clark L. Coleman)---should actually be movl r6, r1 ashl #-32, r1, r0 ediv r7, r0, r2, r11 or perhaps, but this is slower, `movl r6, r0; ashq #-32, r0, r0'] >You might like to time THAT sequence, and rethink your post. Or you >could take my word for it, that when you include the cost of having >to reserve and target into an even-odd register pair, the EDIV is >almost always slower. (Not even-odd, just register pair.) According to my `VAX instruction timings (with FPA)', the original sequence divl3 r7, r6, r0 # r0 = x / y ... mull2 r7, r0 # ... * y subl3 r0, r6, r11 # z = x - r0 will take: VAX-11/780 vs. VAX-11/750 vs. VAX-11/730 WITH FPA INSTRUCTION <EXECUTION TIME MICROSECS> <TIMES 780> 780 750 730 750 730 DIVL3 Reg, Reg, Reg 9.64 8.88 16.15 1.086 0.597 MULL2 Reg, Reg 1.85 5.68 12.05 0.326 0.154 ADDL3 Reg, Reg, Reg 0.60 1.29 2.83 0.465 0.212 ----- ----- ----- 12.09 15.85 31.03 while the EDIV sequence will take: MOVL Reg, Reg 0.40 0.93 1.69 0.430 0.237 ASHL #10, Reg, Reg 2.00 4.03 11.33 0.496 0.177 EDIV Reg, Reg, Reg, Reg 11.86 11.86 100.29 1.000 0.118 ----- ----- ------ 14.26 16.82 113.31 (I have assumed a barrel shifter here.) In other words, on the 750 it is almost a wash (about 5% faster to avoid ediv; this could easily be lost in testing---it is hard to get accurate timings as things depend on, e.g., alignment), while on the 780 avoiding ediv is about 15% faster and on the 730, over 70% faster. I do not have tables for anything but obsolete VAXen, and the ones I have came with this disclaimer: `The following VAX instruction timings were obtained from a former DEC employee. I cannot vouch for their accuracy and have no idea how they were obtained.' You should use your own judgement before charging off with this as `the answer'. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov
clc5q@hemlock.cs.Virginia.EDU (Clark L. Coleman) (05/29/91)
In article <13587@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: > >According to my `VAX instruction timings (with FPA)', the original sequence [timing estimates deleted] Actually, the timing disadvantage for EDIV is worse than the estimates. My timings were based on a version of the assembly code that did not do the longword-swap on the two halves of the quadword dividend (it used R0/R1 instead of R1/R0 pair, you might say) and it was producing overflow, which is faster than producing the right answer. Hence, the "cc" compiler is quite justified in NOT using the EDIV instruction. Unfortunately, as is often the case, a point was lost through use of a very bad example. ----------------------------------------------------------------------------- "The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence." E.W.Dijkstra, 18th June 1975. ||| clc5q@virginia.edu (Clark L. Coleman)