[net.lang.c] 6809 C: Benchmark, and Speed Hints

knudsen@ihnss.UUCP (11/03/84)

<bitty bitty bitty>

Couple weeks ago, someone posted some benchmarks on micro.cbm about
the Sieve of Eristothenes prime-finder benchmark on the 6502, and
solicited results from other 6502 machine owners (Atari, Apple, etc).
Well, the 6809 is sort of a cross between the 6502 and the PDP-11, so I had
to jump right in too.  So far I have done only compiled C.  The results:
	Commodore 64 C:  28 sec  in original posting
	Coco OS-9 C:	 21 sec
Bear in mind three things: (1) The Color COmputer clock rate is only 0.895 MHz;
I'm sure the C-64 runs faster, so my result is even better than it looks.
I'm not saying the 6809 is a superior micro to the other 8-bitters,
but lots of other people have already....
(Would someone please mail to me what the C64 clock is?)  Yes, it IS legitimate
to directly compare 6800. 6502, and 6809 clocks (but not with 8080 types).
(2) I made trivial mods to the posted C code to take advantage of 6809's
auto-increment/decrement instructions, e.g.:
	for(i=0; i<8191; i++)  flags[i]=1;
becomes
	for(i=0; i<8191; )  flags[i++]=1;

(3) Microware/TRS OS-9 C compiler allows global and static variables to
be declared DIRECT, meaning ZERO PAGE in 6502-ese.  This declaration bought me
about 1.5 seconds of realtime over the 10 iterations posted.
Of course, 6502-based C compilers should allow this also, but note that the
6809 can move its "zero" page to any page in memory, so each process or routine
can have its own.



	Some hints I learned last nite about speeded up C functions:
(1) Although the 6809 cleary beats other 8-bitters in stack-frame addressing
for automatic variables, you can run even faster (and a tad shorter) by
re-declaring automatic locals as either DIRECT STATIC in the function body
or DIRECT external and DIRECT global outside of any bodies.

(2) Too many variables done as in (1) will overflow your direct page,
so another hint: Pick the variable that's most critical in a function
and declare that one LAST (or FIRST?).  The idea is to give it a zero offset
from the S-register, so it gets accessed as "0,S" == ",S" which is just
as fast (4 cycles for CHAR) as a DIRECT pager.  All other items in the
stack frame will be "n,S" and will take an extra clock cycle, unless n>7
in which case you pay two extra cycles and another byte.  So put other
critical automatics' declarations next to the most critical one.

(3) I don't know to what extent any C compilers take advantage of keeping
pointers in registers (I THINK that Microware C lacks REGISTER declaration),
but when writing assembler position-independent code, I would re-write
	for(i=0; i<SIZE; ) flags[i++]=1;
to the equivalent of:
	int *p,*top;
	top = flags+SIZE;
	p = flags;
	do {*p++ = 1;} while(p < top);

which gives a 3-instruction loop with no LEA's inside it.
Anyway, the point is to adapt your C programming style to the machine
(and compiler) at hand where speed is more important than clarity
(the for-loop seems a lot more clear in its intent, I admit!)  --mike k