knudsen@ihnss.UUCP (11/03/84)
<bitty bitty bitty> Couple weeks ago, someone posted some benchmarks on micro.cbm about the Sieve of Eristothenes prime-finder benchmark on the 6502, and solicited results from other 6502 machine owners (Atari, Apple, etc). Well, the 6809 is sort of a cross between the 6502 and the PDP-11, so I had to jump right in too. So far I have done only compiled C. The results: Commodore 64 C: 28 sec in original posting Coco OS-9 C: 21 sec Bear in mind three things: (1) The Color COmputer clock rate is only 0.895 MHz; I'm sure the C-64 runs faster, so my result is even better than it looks. I'm not saying the 6809 is a superior micro to the other 8-bitters, but lots of other people have already.... (Would someone please mail to me what the C64 clock is?) Yes, it IS legitimate to directly compare 6800. 6502, and 6809 clocks (but not with 8080 types). (2) I made trivial mods to the posted C code to take advantage of 6809's auto-increment/decrement instructions, e.g.: for(i=0; i<8191; i++) flags[i]=1; becomes for(i=0; i<8191; ) flags[i++]=1; (3) Microware/TRS OS-9 C compiler allows global and static variables to be declared DIRECT, meaning ZERO PAGE in 6502-ese. This declaration bought me about 1.5 seconds of realtime over the 10 iterations posted. Of course, 6502-based C compilers should allow this also, but note that the 6809 can move its "zero" page to any page in memory, so each process or routine can have its own. Some hints I learned last nite about speeded up C functions: (1) Although the 6809 cleary beats other 8-bitters in stack-frame addressing for automatic variables, you can run even faster (and a tad shorter) by re-declaring automatic locals as either DIRECT STATIC in the function body or DIRECT external and DIRECT global outside of any bodies. (2) Too many variables done as in (1) will overflow your direct page, so another hint: Pick the variable that's most critical in a function and declare that one LAST (or FIRST?). The idea is to give it a zero offset from the S-register, so it gets accessed as "0,S" == ",S" which is just as fast (4 cycles for CHAR) as a DIRECT pager. All other items in the stack frame will be "n,S" and will take an extra clock cycle, unless n>7 in which case you pay two extra cycles and another byte. So put other critical automatics' declarations next to the most critical one. (3) I don't know to what extent any C compilers take advantage of keeping pointers in registers (I THINK that Microware C lacks REGISTER declaration), but when writing assembler position-independent code, I would re-write for(i=0; i<SIZE; ) flags[i++]=1; to the equivalent of: int *p,*top; top = flags+SIZE; p = flags; do {*p++ = 1;} while(p < top); which gives a 3-instruction loop with no LEA's inside it. Anyway, the point is to adapt your C programming style to the machine (and compiler) at hand where speed is more important than clarity (the for-loop seems a lot more clear in its intent, I admit!) --mike k