stevens@hsi.UUCP (12/05/87)
While trying to hand optimize some C code for a graphics routine that I wanted to get as fast as possible, I performed some timings on the 3b1. What I wanted was the relative speeds of the basic operations on different data types, to see if there is anything "interesting". My results are: add sub mul div ----- ----- ----- ----- register short 1.0 1.0 4.0 11.0 short 3.1 3.1 6.1 13.1 register long 1.2 1.2 33.3 44.5 long 4.3 4.3 36.5 47.7 register float 340.4 293.5 452.6 503.7 float 344.1 296.5 458.7 504.3 register double 98.8 90.5 211.9 258.9 double 98.8 90.5 211.9 258.9 I didn't try to compare any absolute values for the 3b1 with any other system, I just wanted to know how to write "optimal" code, when necessary (i.e., inner loops of graphics routines). The numbers above are all relative to the value 1.0 for a register short add. I used the cc optimizer for all timings. A couple of observations: - stick to shorts instead of ints or longs, when possible, since a 32-bit multiply or divide gets very expensive. This is usually possible for graphics routines, and indeed I've noticed that some source (such as an implementation of Bresenham's line drawing algorithm from The Store) uses only shorts. - registers don't buy you much except for adds and subtracts (and assignments too, I'd guess). - avoid floats, and stick to doubles. The C rule that forces all float artihmetic to be performed using double precision kills you on this system. - this system really should have been designed with an FPU, as the floating point times are all 1 to 2 orders of magnitude greater than the integer times. Would anyone from AT&T who is "in the know" about the 3b1, care to comment why there isn't one available ?? There are a couple of other points that I figured out about the 3b1, that may be of interest: there are 6 short registers available (d2, d3, d4, d5, d6, d7) there are 6 long registers available (d2, d3, d4, d5, d6, d7) there are 6 float registers available (d2, d3, d4, d5, d6, d7) there are 4 pointer registers available (a2, a3, a4, a5) Overall I wasn't very impressed with the code quality of the C compiler, even with the optimizer. Richard Stevens Health Systems International, New Haven, CT { uunet | ihnp4 } ! hsi ! stevens
andrew@teletron.UUCP (12/07/87)
In article <787@hsi.UUCP>, stevens@hsi.UUCP (Richard Stevens) writes: > While trying to hand optimize some C code for a graphics routine > that I wanted to get as fast as possible, I performed some > timings on the 3b1. > I used the cc optimizer for all timings. A couple of observations: > - stick to shorts instead of ints or longs, when possible, since > a 32-bit multiply or divide gets very expensive. This holds true for array indices as well. Using a short as an array index results in the 68000 muls.w or mulu.w instruction being used for the address calculation instead of the more expensive 32 bit multiplication subroutines used for ints or longs. > - registers don't buy you much except for adds and subtracts (and > assignments too, I'd guess). Actually, register variables buy you a *lot* in most code. The 68000 family was designed so that register usage makes code sing. I would *highly* recommend register variables for heavily used pointer variables (such as in string processing or structure access routines). > Overall I wasn't very impressed with the code quality of the C compiler, > even with the optimizer. Me either. Andrew Scott (..alberta!teletron!andrew) TeleTronic Communications Ltd.