[net.lang.c] Yet more trash about strcpy and strlen on various micros

MCCLUSKEY@JPL-VLSI.ARPA (John McCluskey) (03/14/85)

CC: INFO-MICRO@BRL-VGR.ARPA

     All this talk of strcpy made me curious, so just for the heck of
it, I made up this table with the ultimate speed possible for strcpy for
each machine, using published manufacturers data sheets.  This assumes
null terminated strings where you don't know how long the source string
is (which makes quite a difference for the iAPX-286 ).	These figures
further assume no wait states or memory management delays.

		    STRCPY()  BENCHMARK FOR VERY LONG STRINGS.

CPU	| Clock Speed	| cycles/byte	| strcpy speed	| Conditions
--------+---------------+---------------+---------------+----------------------
MC68020 |   16 Mhz	|      13	|  1.2 Mb/sec	| Cache enabled
--------+---------------+---------------+---------------+----------------------
MC68010 |   10 Mhz	|      14	|  714 Kb/sec	| 64 Kb max string len.
--------+---------------+---------------+---------------+----------------------
iAPX-286|    8 Mhz	|      12  **	|  666 Kb/sec	| 64 Kb max string len.
--------+---------------+---------------+---------------+----------------------
MC68000 | 12.5 Mhz	|      22	|  568 Kb/sec	|
--------+---------------+---------------+---------------+----------------------
NS32032 |   10 Mhz	|      26 (!)	|  384 Kb/sec	| same speed as 32016!
--------+---------------+---------------+---------------+----------------------

** The iAPX-286 can block copy memory at blinding speed, but since
   move instructions don't affect the status register, the 286 first
   has to execute a strlen() to get the length of the source string.
   Strlen is 8 cycles/byte, and the actual move is 4 cycles/byte.
   Pascal and Fortran win big on the 286.


Also, along the same vein, STRLEN();

		    STRLEN() BENCHMARK FOR VERY LONG STRINGS.

CPU	| Clock Speed	| cycles/byte	| strlen speed	| Conditions
--------+---------------+---------------+---------------+----------------------
MC68020 |   16 Mhz	|      12	|  1.3 Mb/sec	| Cache enabled
--------+---------------+---------------+---------------+----------------------
iAPX-286|    8 Mhz	|      8	|  1.0 Mb/sec	| 64 Kb max string len.
--------+---------------+---------------+---------------+----------------------
MC68010 |   10 Mhz	|      12	|  833 Kb/sec	| 64 Kb max string len.
--------+---------------+---------------+---------------+----------------------
MC68000 | 12.5 Mhz	|      18	|  690 Kb/sec	|
--------+---------------+---------------+---------------+----------------------
NS32032 |   10 Mhz	|      28 (!)	|  357 Kb/sec	| same speed as 32016!
--------+---------------+---------------+---------------+----------------------


     Two things from this survey surprised me, one,  that  the	iAPX-286
makes  very  good  use	of clock cycles, and two, that the 32032 and the
32016 have such amazingly  slow  micro-code  for  a  single  instruction
string move or scan (MOVSB U and SKPSB).

    I have probably made some errors in these tables, if so, send corrections
to:  MCCLUSKEY@JPL-VLSI.ARPA  and I'll retract the erroneous figures.
------

henry@utzoo.UUCP (Henry Spencer) (03/15/85)

>      Two things from this survey surprised me, one,  that  the iAPX-286
> makes  very  good  use of clock cycles...

Well, it is the third time Intel has implemented the 86 architecture,
so they've had a chance to learn from their botches...

> ... and two, that the 32032 and the
> 32016 have such amazingly  slow  micro-code  for  a  single  instruction
> string move or scan (MOVSB U and SKPSB).

The word is not "amazingly" but "disgustingly".  When I asked him about
it point-blank (after studying the timing numbers in the 32016 manual)
my National rep admitted that the string instructions are actually
slower than tightly-coded multi-instruction sequences!!!  Try it without
the string instructions, preferably with some degree of loop unrolling.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry