stuart@wheaton (Stuart Ceilous Williams) (04/23/86)
................................................................................ Here are the results of a small timing study we did on some of the instructions of our MicrovaxII (with FPU): We are running Ultrix 1.0; the times were computed by executing a tight loop 10^7 times, timing it with a call to getrusage, and subracting the time it took to execute an empty loop 10^7 times. The first set of instructions were executed with register mode addressing. Command Time(microseconds) ------------------------------ bitb : 0.347 decl : 0.348 bisw3 : 0.349 subb2 : 0.349 bicb3 : 0.352 bicl2 : 0.354 movw : 0.354 incb : 0.355 movb : 0.355 incw : 0.356 bicw2 : 0.357 movl : 0.357 bisl2 : 0.358 subb3 : 0.358 addw3 : 0.359 mnegb : 0.359 movl : 0.359 movzwl: 0.359 movl : 0.360 bisl2 : 0.361 bicb2 : 0.362 cmpw : 0.362 xorl2 : 0.362 bitl : 0.363 bicl2 : 0.364 bisw2 : 0.365 bitw : 0.365 mnegw : 0.365 movzbw: 0.366 addw2 : 0.367 movw : 0.367 movw : 0.368 cmpb : 0.369 subl2 : 0.371 cmpl : 0.375 mnegl : 0.375 bitl : 0.378 clrw : 0.378 movb : 0.378 subw3 : 0.378 subw2 : 0.379 subl3 : 0.384 movb : 0.424 movb : 0.439 movq : 0.521 subw3 : 0.524 cvtwl : 0.529 addl2 : 0.530 decb : 0.532 bisb2 : 0.533 cvtbw : 0.542 clrd : 0.543 clrb : 0.548 cvtbl : 0.562 bicw3 : 0.563 incl : 0.762 decw : 0.764 cvtlw : 0.766 movq : 0.769 clrl : 0.786 movf : 0.960 cvtlb : 0.999 bisb3 : 1.008 addl3 : 1.012 cvtwb : 1.214 mnegf : 1.230 brb : 1.278 movd : 1.401 mnegd : 1.604 blbc : 1.824 jmp : 1.972 cvtfd : 2.022 cvtfl : 2.037 cvtfb : 2.043 cvtfl : 2.044 cvtfw : 2.052 mulf2 : 2.313 divf3 : 2.314 divf2 : 2.319 mulf3 : 2.323 cvtdb : 2.440 cvtdl : 2.443 cvtdw : 2.459 cvtdf : 2.466 cvtdl : 2.478 cmpf : 2.502 bbcc : 2.688 addf3 : 2.921 rotl : 3.058 cvtwf : 3.077 muld3 : 3.099 cvtlf : 3.108 subf3 : 3.130 cmpd : 3.137 addf2 : 3.267 divd2 : 3.268 cvtwd : 3.271 muld2 : 3.273 subf2 : 3.275 bsbb : 3.277 cvtld : 3.290 cvtbf : 3.543 divd3 : 3.703 subd3 : 3.726 cvtbd : 3.733 addd3 : 3.739 ashl : 3.776 jsb : 3.857 subd2 : 3.906 addd2 : 3.914 ashq : 4.237 mulb2 : 4.397 mulb3 : 4.757 mull2 : 4.999 mull3 : 5.410 mulw2 : 5.625 mulw3 : 5.830 divb2 : 6.440 acbf : 6.475 divb3 : 6.657 divl2 : 7.931 divl3 : 8.110 emul : 8.313 divw2 : 9.414 divw3 : 9.602 ediv : 11.632 callg : 12.676 calls : 13.318 movp : 94.498 cvtlp : 104.533 cvtpl : 128.586 subp4 : 285.064 mulp : 389.623 ------------------- Comments: The packed instructions above are implemented as operating system calls, which explains the large execution times. Some of these we found were not working in Ultrix 1.1 (they should be fixed by now). It is interesting to note that printf uses some of the packed instructions. Also, the time for the calls instruction should give some idea of when it is better to use a macro than a function call. (For the following instructions, l1 is the beginning of an array of longs). Instruction Addressing Modes Time (microseconds) ----------------------------------------------------------------------------- movl $1,(r1)[r2] # deferred index mode 1.538 movl $1,r1 # register mode .557 movl $1,l1[r2] # index mode 1.246 movl $1,-(r1) # autodecrement mode .567 movl $1,(r1)+ # autoincrement mode 1.195 movl $1,*(r1)+ # autoincrement deferred 1.624 movl $1,4(r1) # displacement mode .612 movl $1,*4(r1) # displacement deferred mode 1.241 movl $1,r1 # immediate mode .376 movl $1,l1 # relative mode 1.240 movl $1,*l1 # relative deferred 1.884 movl $1,(r2)[r2] # index mode .834 movl $1,(r1)+[r2] # autoincrement indexed mode 1.577 movl $1,-(r1)[r2] # autodecrement indexed mode .992 movl $1,4(r1)[r2] # displacement indexed 1.220 movl $1,*4(r1)[r2] # displacement deferred indexed 1.875 movl $1,*(r1)+[r2] # autodecrement deferred 2.267 movl (r1),(r2) # double deferred 1.128 moval l1,r1 # hmmmm....(initialization) .887 -------------------- Queries: What are the possible sources of error for this study as briefly described? Does the getrusage call provide an accurate accounting of the time used (we used ru_utime)? (Mail me responses) -------------------------------------------------------------------------------- Stuart Williams UUCP: ...ihnp4!wheaton!stuart (Computer Science / Physics student at Wheaton College, near Chicago)
jbs@mit-eddie.MIT.EDU (Jeff Siegal) (04/25/86)
IN ARTICLE <90@WHEATON> STUART@WHEATON.UUCP (STUART CEILOUS WILLIAMS) WRITES: >[...] >brb : 1.278 >[...] >jmp : 1.972 >[...] >bsbb : 3.277 >[...] >jsb : 3.857 >[...] >callg : 12.676 >calls : 13.318 >[...] > Also, the time for the calls instruction should give some idea of when >it is better to use a macro than a function call. >[...] Of course, this has been said before, but the time for the call instructions also gives some idea of how useful it is to have a compiler which: 1) Expands function calls inline when appropriate. 2) Uses a faster, but less general, way to get to a function when possible (usually only for "static" functions). Jeff Siegal - MIT EECS
operators@watmath.UUCP (M.F.C.F. Operators) (05/04/86)
In article <1730@mit-eddie.MIT.EDU> jbs@mit-eddie.UUCP (Jeff Siegal) writes: >Of course, this has been said before, but the time for the call >instructions also gives some idea of how useful it is to have a >compiler which: > >1) Expands function calls inline when appropriate. > >2) Uses a faster, but less general, way to get to a function when >possible (usually only for "static" functions). > or 3) Have a compiler designers spent a significant amount of time thinking about and designing a call convention call convention that is fast and flexible. It has been observed a number of times that the manufacturer suggested call convention is seldom even close to being optimal for the machine. Similarly, the ones used by most C compilers are simply quick adaptations from the one used on the last machine the compiler implementor saw. I have often wondered how much the "RISC" performance comes simply from the fact that registers were used to pass arguments, instead of being blindly having garbage values "saved" and restored on every call.