[comp.sys.att] 3b1 instruction timings

stevens@hsi.UUCP (12/05/87)

While trying to hand optimize some C code for a graphics routine
that I wanted to get as fast as possible, I performed some
timings on the 3b1.  What I wanted was the relative speeds of
the basic operations on different data types, to see
if there is anything "interesting".  My results are:

				  add	  sub	  mul	  div
				-----	-----	-----	-----
	register short		  1.0	  1.0	  4.0	 11.0
	short			  3.1	  3.1	  6.1	 13.1

	register long		  1.2	  1.2	 33.3	 44.5
	long			  4.3	  4.3	 36.5	 47.7

	register float		340.4	293.5	452.6	503.7
	float			344.1	296.5	458.7	504.3

	register double		 98.8	 90.5	211.9	258.9
	double			 98.8	 90.5	211.9	258.9

I didn't try to compare any absolute values for the 3b1 with any
other system, I just wanted to know how to write "optimal" code,
when necessary (i.e., inner loops of graphics routines).  The numbers
above are all relative to the value 1.0 for a register short add.
I used the cc optimizer for all timings.  A couple of observations:

- stick to shorts instead of ints or longs, when possible, since
	a 32-bit multiply or divide gets very expensive.  This is
	usually possible for graphics routines, and indeed I've noticed
	that some source (such as an implementation of Bresenham's
	line drawing algorithm from The Store) uses only shorts.

- registers don't buy you much except for adds and subtracts (and
	assignments too, I'd guess).

- avoid floats, and stick to doubles.  The C rule that forces all
	float artihmetic to be performed using double precision
	kills you on this system.

- this system really should have been designed with an FPU, as the
	floating point times are all 1 to 2 orders of magnitude greater
	than the integer times.  Would anyone from AT&T who is
	"in the know" about the 3b1, care to comment why there isn't
	one available ??

There are a couple of other points that I figured out about the 3b1,
that may be of interest:

    there are 6 short registers available (d2, d3, d4, d5, d6, d7)
    there are 6 long registers available (d2, d3, d4, d5, d6, d7)
    there are 6 float registers available (d2, d3, d4, d5, d6, d7)
    there are 4 pointer registers available (a2, a3, a4, a5)

Overall I wasn't very impressed with the code quality of the C compiler,
even with the optimizer.

	Richard Stevens
	Health Systems International, New Haven, CT
           { uunet | ihnp4 } ! hsi ! stevens

andrew@teletron.UUCP (12/07/87)

In article <787@hsi.UUCP>, stevens@hsi.UUCP (Richard Stevens) writes:
> While trying to hand optimize some C code for a graphics routine
> that I wanted to get as fast as possible, I performed some
> timings on the 3b1.

> I used the cc optimizer for all timings.  A couple of observations:

> - stick to shorts instead of ints or longs, when possible, since
> 	a 32-bit multiply or divide gets very expensive.

This holds true for array indices as well.  Using a short as an array
index results in the 68000 muls.w or mulu.w instruction being used for
the address calculation instead of the more expensive 32 bit multiplication
subroutines used for ints or longs.


> - registers don't buy you much except for adds and subtracts (and
> 	assignments too, I'd guess).

Actually, register variables buy you a *lot* in most code.  The 68000 family
was designed so that register usage makes code sing.  I would *highly*
recommend register variables for heavily used pointer variables (such as
in string processing or structure access routines).


> Overall I wasn't very impressed with the code quality of the C compiler,
> even with the optimizer.

Me either.

	Andrew Scott			(..alberta!teletron!andrew)
	TeleTronic Communications Ltd.