[comp.os.cpm] Z80 Algorithms

wittig@gmdzi.UUCP (Georg Wittig) (10/03/89)

The Z80 processor is known to be one of the cheapest processors in the world,
but alas it is also known to be NOT one of the fastest. The more I am
interested in fast assembler coded algorithms to avoid unnecessary waiting
periods.

The algorithms I'm interested in are relatively elementary: 16 bit
multiplication, 16 bit division, 32 bit arithmetics, filling an arbitraray
amount of storage with a given bit pattern, conversion of ascii encoded numbers
to binary and vice versa, conversion from binary to hexadecimal notation, etc.

Well, I do have such algorithms; but I wonder if they can be made faster. Does
there exist a collection of such algorithms I can access? (I don't have access
to anonymous ftp.)

I'm watching this newsgroup since last winter, and I haven't seen one article
on this subject. Or do I watch the wrong newsgroup?	:-|

If there is someone who wants to send me some of those algorithms, please
don't mail them, please post them: I could bet there are some people out there
in netland who are interested, too.  :-)

Waiting patiently ... Thanks in advance,
-- 
Georg Wittig   GMD-Z1.BI   P.O. Box 1240   D-5205 St. Augustin 1 (West Germany)
email: wittig@gmdzi.uucp   phone: (+49 2241) 14-2294
-------------------------------------------------------------------------------
"Freedom's just another word for nothing left to lose" (Kris Kristofferson)

rzh@lll-lcc.UUCP (Roger Hanscom) (10/05/89)

In <1306@gmdzi.UUCP>, Georg Wittig (wittig@gmdzi.UUCP) requests Z80
algorithms for 16- and 32-bit arithmetic, code conversion, etc.

Georg-
   There's a pretty good book on the subject called "Z80 Assembly Language
Routines" by Lance Leventhal and Winthrop Saville (Osborne/McGraw-Hill,
Berkeley, 1983).  I don't know if it's still in print (soft-bound), but
perhaps you could find it in a good library??  It presents assembly language
fragments with descriptive text in chapters with titles such as:  "code
conversion", "string manipulation", "arithmetic", "bit manipulation and
shifts", etc.  I can't vouch for the relative speed of these algorithms,
but it certainly could be a good place to start!  If you are interested
in floating point arithmetic, there's not much here.  You might then try
looking at a dis-assembled BASIC or f.p. math libraries from a Z80 "C"
compiler??  Any help with f.p. out there????

          roger              rzh@lll-lcc.llnl.gov

jurjen@cwi.nl (Jurjen N.E. Bos) (10/10/89)

Maybe somebody is interested in this: I can multiply to unsigned 8-bit numbers
in 158 clockcycles.  If you can beat me, please let me know.  If you are
interested, I'll publish the algorithm.
To make you anxious, I will not tell you the trick right now.
(By the way: it costs less than 1K.)
-- 
|                 | "Never imagine yourself not to be otherwise than what |
| Jurjen N.E. Bos | it might appear to others that what you were or might |
|                 | have been was not otherwise than what you had been    |
|  jurjen@cwi.nl  | would have appeared to them to be otherwise."         |

black@ROM.ECSE.RPI.EDU (Jerry Glomph Black) (10/14/89)

  (Jurjen N.E. Bos) writes:
>Maybe somebody is interested in this: I can multiply to unsigned 8-bit numbers
>in 158 clockcycles.  If you can beat me, please let me know.  If you are
>interested, I'll publish the algorithm.
>To make you anxious, I will not tell you the trick right now.
>(By the way: it costs less than 1K.)

Big deal! Use a 64180 (aka Z180) chip, has the MLT instruction, takes 17
clock cycles, or 1.85 micro seconds.  (By the way: it's a 2-byte instruction)

Jerry Glomph Black, 8-bit terrorist

mwilson@crash.cts.com (Marc Wilson) (10/15/89)

In article <8910132000.AA03993@rom.ecse.rpi.edu> black@ROM.ECSE.RPI.EDU (Jerry Glomph Black) writes:
>
>Big deal! Use a 64180 (aka Z180) chip, has the MLT instruction, takes 17
>clock cycles, or 1.85 micro seconds.  (By the way: it's a 2-byte instruction)

     Since when is the 64180 made by Zilog?  The 64180 is a Hitachi Z80 clone
with some additional instructions.


-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Marc Wilson
     ARPA: ...!crash!mwilson@nosc.mil
           ...!crash!pnet01!pro-sol!mwilson@nosc.mil
     UUCP: [ cbosgd | hp-sdd!hplabs | sdcsvax | nosc ]!crash!mwilson
     INET: mwilson@crash.CTS.COM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

black@ROM.ECSE.RPI.EDU (Jerry Glomph Black) (10/16/89)

>Big deal! Use a 64180 (aka Z180) chip, has the MLT instruction, takes 17
>clock cycles, or 1.85 micro seconds.  (By the way: it's a 2-byte instruction)

>> Since when is the 64180 made by Zilog?  The 64180 is a Hitachi Z80 clone
>>with some additional instructions.

Zilog, having flubbed the Z280 project for so many years, was mortally
embarrassed by Hitachi's excellent 64180 chip (and even more excellent 1-chip
version, the 647180).  Thus they made a deal to market the 64180 under the
pseudonym Z180.  The 64180 is far more than a 'Z80 clone'. It does support
the Z80 instruction set, but has 2 asynch and 1 synch serial ports, 1 Meg
address space, 2 16-bit timers, and 4 external interrupt lines. 

Jerry Glomph Black, 8-bit terrorist

dbraun@cadev5.intel.com (Doug Braun ~) (10/16/89)

In article <8910132000.AA03993@rom.ecse.rpi.edu> black@ROM.ECSE.RPI.EDU (Jerry Glomph Black) writes:

>  (Jurjen N.E. Bos) writes:
>>Maybe somebody is interested in this: I can multiply to unsigned 8-bit numbers
>>in 158 clockcycles.  . . .
>
>Big deal! Use a 64180 (aka Z180) chip, has the MLT instruction, takes 17
>clock cycles, or 1.85 micro seconds.  (By the way: it's a 2-byte instruction)
>
>Jerry Glomph Black, 8-bit terrorist


Bigger deal!! Use a Z280 and do *16* bit multiplies (signed AND unsigned)
in 25 or so clcok cycles.  Also divide! (32 bits / 16 bits).
Also 2-byte instructions.  Unless you want to do something like:
	HL := DEHL DIV (IX+2345H)


Doug Braun				Intel Corp CAD
					408 765-4279

 / decwrl \
 | hplabs |
-| oliveb |- !intelca!mipos3!cadev4!dbraun
 | amd    |
 \ qantel /

 or:

 dbraun@cadev4.intel.com

josef@peun11.uucp (Moellers) (10/17/89)

mwilson@crash.cts.com (Marc Wilson) writes:

>     Since when is the 64180 made by Zilog?  The 64180 is a Hitachi Z80 clone
>with some additional instructions.

It was not originally made by Zilog (i.e. developped) but Zilog probably
found it a neat chip (which it is) and it fitted nicely in their range,
so they second-sourced it and named it the Z180.

That's all folks.

		Josef Moellers

	paper mail:			e-mail:
c/o Nixdorf Computer AG		USA:  uunet!philabs!linus!nixbur!mollers.pad
Abt. DX-PC			!USA: mcvax!unido!nixpbe!mollers.pad
Pontanusstrasse				Phone:
D-4790 Paderborn		(+49) 5251 146245
+-----------------------------------------------------------------------+
| "Many that live deserve death. And some that die deserve life.	|
|  Can You give it to them? Then do not be too eager to deal out	|
|  death in judgement"							|
|			Gandalf to Frodo in "The Fellowship of the Ring"|
+-----------------------------------------------------------------------+
		Josef Moellers

	paper mail:			e-mail:
c/o Nixdorf Computer AG		USA:  uunet!philabs!linus!nixbur!mollers.pad

wittig@gmdzi.UUCP (Georg Wittig) (10/17/89)

Someone mailed to me the following memory fill algorithm:

>
>	ld	hl,buffer		; point at buffer
>	ld	de,buffer + 1		; point at next byte
>	ld	bc,count - 1		; number of bytes minus one
>	ld	(hl),xxx		; save the first byte
>	ldir				; replicate through rest of buffer
>
>is the fastest buffer fill I know on the Z80.

There exists a much faster faster algorithm for that. Let's see:

The central statement in your solution is `ldir'. It takes 21 T states per
byte. For 16 bytes this is 336 T states.

Using the push statement is much faster:
	Set D = E = the byte to be filled in;
	let SP point 1 byte after the end of the area to be filled;
	B contains the number of 16 byte blocks to be filled.
Then use "push DE"s, and you're finished very quickly.

	DI			; CP/M must not interrupt, because SP will be
				; misused
	LD	(sp_save),SP	; save current SP value
	LD	SP,HL		; assuming HL points to <end+1>
L:	PUSH	DE		; 8 times, so 16 bytes are filled
	PUSH	DE
	PUSH	DE
	PUSH	DE
	PUSH	DE
	PUSH	DE
	PUSH	DE
	PUSH	DE
	DJNZ	L		; a 16 byte portion has been processed.
	LD	SP,(sp_save)	; restore the SP
	EI			; done

This way you can fill up to 4096 (256*16) bytes. If more bytes are to be
filled, build a loop around it. If the number of the bytes to be filled isn't a
multiple of 16, the bytes 1 to 15 can be filled straight forward with a
traditional algorithm.

The timing of that algorithm: The central loop starts at "L:" and ends with
"DJNZ". "push DE" needs 11 T states; DJNZ needs 13 ones. So for 16 bytes to be
filled you get: 

	8 * 11 + 13 = 101

	101 / 336 = 30 %

Relatively fast, isn't it? And -- it works!

PS: The idea isn't mine, I found it some time ago in a journal. I'm sorry I
don't remember which one it was.

-- 
Georg Wittig   GMD-Z1.BI   P.O. Box 1240   D-5205 St. Augustin 1 (West Germany)
email: wittig@gmdzi.uucp   phone: (+49 2241) 14-2294
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Freedom's just another word for nothing left to lose" (Kris Kristofferson)

nobody@wrgate.WR.TEK.COM (Unpriveleged user) (10/18/89)

>Someone mailed to me the following memory fill algorithm:
>
>>
>>	ld	hl,buffer		; point at buffer
>>	ld	de,buffer + 1		; point at next byte
>>	ld	bc,count - 1		; number of bytes minus one
>>	ld	(hl),xxx		; save the first byte
>>	ldir				; replicate through rest of buffer
>>
>>is the fastest buffer fill I know on the Z80.
>
>There exists a much faster faster algorithm for that. Let's see:
>
>The central statement in your solution is `ldir'. It takes 21 T states per
>byte. For 16 bytes this is 336 T states.
>
>Using the push statement is much faster:
>	Set D = E = the byte to be filled in;
>	let SP point 1 byte after the end of the area to be filled;
>	B contains the number of 16 byte blocks to be filled.
>Then use "push DE"s, and you're finished very quickly.
>
>	DI			; CP/M must not interrupt, because SP will be
>				; misused
	{rest of clever PUSHing algorithm deleted}
From: michaelk@copper.WR.TEK.COM (Michael D. Kersenbrock)
Path: copper!michaelk


The push-algorithm is good for memory fill patterns when the pattern itself
is of a particular variety and is of a fixed length.

The LDIR algorithm works for any pattern of length.  You load the pattern
once, with:

	 hl-> beginning of memory area
	 de-> beginning of memory area + pattern_length
	 bc ==  fill_length - pattern length

You then load your pattern starting at (HL) ONCE with a sequence that gets
it from the source (wherever it comes from).

Then you LDIR it.

If the fill length is less than pattern length, this can be tested when
being loaded manually the "first (and only) time".

This generalization of the 1-byte case (presented at the top of this
article) is a fast way of loading variable-length patterns, AND it can be
interrupted.  Byte count is small too.

Just depends on your needs :-).

Mike Kersenbrock
Tektronix Microprocessor Development Products
michaelk@copper.WR.TEK.COM
Aloha, Oregon

josef@peun11.uucp (Moellers) (10/18/89)

dbraun@cadev5.intel.com (Doug Braun ~) writes:


>Bigger deal!! Use a Z280 and do *16* bit multiplies (signed AND unsigned)
>in 25 or so clcok cycles.  Also divide! (32 bits / 16 bits).
>Also 2-byte instructions.  Unless you want to do something like:
>	HL := DEHL DIV (IX+2345H)


>Doug Braun				Intel Corp CAD
>					408 765-4279

OK, but then You'll loose Z80 compatibility (unless I'm wrong)!
You might as well take the NS32K, that gives You
multiply, divide, remainder in all sorts of data types (signed,
unsigned, float) and sized (byte, word, double, float, long)


		Josef Moellers

	paper mail:			e-mail:
c/o Nixdorf Computer AG		USA:  uunet!philabs!linus!nixbur!mollers.pad
Abt. DX-PC			!USA: mcvax!unido!nixpbe!mollers.pad
Pontanusstrasse				Phone:
D-4790 Paderborn		(+49) 5251 146245
+-----------------------------------------------------------------------+
| "Many that live deserve death. And some that die deserve life.	|
|  Can You give it to them? Then do not be too eager to deal out	|
|  death in judgement"							|
|			Gandalf to Frodo in "The Fellowship of the Ring"|
+-----------------------------------------------------------------------+
		Josef Moellers

	paper mail:			e-mail:
c/o Nixdorf Computer AG		USA:  uunet!philabs!linus!nixbur!mollers.pad

dg@lakart.UUCP (David Goodenough) (10/18/89)

dbraun@cadev5.intel.com (Doug Braun ~) sez:
] black@ROM.ECSE.RPI.EDU (Jerry Glomph Black) writes:
] 
]>  (Jurjen N.E. Bos) writes:
]>>Maybe somebody is interested in this: I can multiply to unsigned 8-bit numbers
]>>in 158 clockcycles.  . . .
]>
]>Big deal! Use a 64180 (aka Z180) chip, has the MLT instruction, takes 17
]>clock cycles, or 1.85 micro seconds.  (By the way: it's a 2-byte instruction)
]>
]>Jerry Glomph Black, 8-bit terrorist
] 
] 
] Bigger deal!! Use a Z280 and do *16* bit multiplies (signed AND unsigned)
] in 25 or so clock cycles.

Yes, but are they plug in replacements for the Z80??

No, I didn't think so.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com			  +---+

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/19/89)

In article <8910161517.AA06146@rom.ecse.rpi.edu>, black@ROM.ECSE.RPI.EDU (Jerry Glomph Black) writes:
|  pseudonym Z180.  The 64180 is far more than a 'Z80 clone'. It does support
|  the Z80 instruction set, but has 2 asynch and 1 synch serial ports, 1 Meg
|  address space, 2 16-bit timers, and 4 external interrupt lines. 

  So can I get a nice S100 board with one? Maybe even a nice single
board CP/M system to replace the collection of relics I currently run?
Either that or a board which gives me an AT on a single A100 card ;-)
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/19/89)

In article <1335@gmdzi.UUCP>, wittig@gmdzi.UUCP (Georg Wittig) writes:

|  This way you can fill up to 4096 (256*16) bytes. If more bytes are to be
|  filled, build a loop around it. If the number of the bytes to be filled isn't a
|  multiple of 16, the bytes 1 to 15 can be filled straight forward with a
|  traditional algorithm.

  No, you and the count with 17(8) and then jump into the loop so you
only do the excess modulo 16 the first time. If the excess is non-zero
you have to loop one more time, though.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

bobc@attctc.Dallas.TX.US (Bob Calbridge) (10/19/89)

In article <1238@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
` In article <8910161517.AA06146@rom.ecse.rpi.edu>, black@ROM.ECSE.RPI.EDU (Jerry Glomph Black) writes:
` |  pseudonym Z180.  The 64180 is far more than a 'Z80 clone'. It does support
` |  the Z80 instruction set, but has 2 asynch and 1 synch serial ports, 1 Meg
` |  address space, 2 16-bit timers, and 4 external interrupt lines. 
` 
`   So can I get a nice S100 board with one? Maybe even a nice single
` board CP/M system to replace the collection of relics I currently run?
` Either that or a board which gives me an AT on a single A100 card ;-)

What you need is a nice S180 board with one.  Actually, I recall that the
Micro Mint used to have one.  Pretty cheap too.  I don't know if they're
still in business.  Used to be you could find their ads in Byte.  I haven't
read it in so long I can't guarantee it.  It was a single board system and
came, I seem to recall, with either CP/M or a CP/M clone.
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=             I know it's petty..........                                     =
-                  But I have to justify my salary!                           -
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

josef@peun11.uucp (Moellers) (10/20/89)

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:

>  So can I get a nice S100 board with one? Maybe even a nice single
>board CP/M system to replace the collection of relics I currently run?
>Either that or a board which gives me an AT on a single A100 card ;-)

You CAN get a nice system with the 64180 on it:
The Micromint/Ciarcia's Circuit Cellar SB180(FX)
It features:
	- HD64180 (with the interfaces: 2xserial, timers, dma, mmu)
	- 256 kB memory (FX has 512 kB, externally extendable to 2Megs)
	- floopy controller for 8", 5.25" (and 3.5")
	- parallel interfaces (CENTRONICS AND a 8255)
	- SCSI chip (FX only)
	- EPROM with monitor
You can oder it with the Zsystem (CPM2.2 compatible but A LOT NICER)
costs about $500.

Add to this the GT180:
	- piggy backs onto SB180(FX)
	- high resolution: (standard is 640x480 but I ran it on a
	  Mutlisync II)
	- Uses HD63484 ACRTC -> LINE, RECTANGLE, CIRCLE, ELLIPSE, ...
	- RGBI/TTL or 16 out of 4096 colors analog
Makes You another $500 poorer, but ITS GREAT!!!

Give Micromint a ring: 1-(800)635-3355 or 1-(203)-871-6170

NOTE: I AM IN NO WAY CONNECTED TO MICROMINT APART FROM BEING VERY HAPPY
      WITH THE SB180FX/GT180

		Josef Moellers

	paper mail:			e-mail:
c/o Nixdorf Computer AG		USA:  uunet!philabs!linus!nixbur!mollers.pad
Abt. DX-PC			!USA: mcvax!unido!nixpbe!mollers.pad
Pontanusstrasse				Phone:
D-4790 Paderborn		(+49) 5251 146245
+-----------------------------------------------------------------------+
| "Many that live deserve death. And some that die deserve life.	|
|  Can You give it to them? Then do not be too eager to deal out	|
|  death in judgement"							|
|			Gandalf to Frodo in "The Fellowship of the Ring"|
+-----------------------------------------------------------------------+

		Josef Moellers

	paper mail:			e-mail:
c/o Nixdorf Computer AG		USA:  uunet!philabs!linus!nixbur!mollers.pad

fifi@cosmo.UUCP (A.F.Zinser) (10/25/89)

In article <577@nixpbe.UUCP> josef@peun11.uucp (Moellers) writes:
> 
> [...]
> >Bigger deal!! Use a Z280 and do *16* bit multiplies (signed AND unsigned)
> >in 25 or so clcok cycles.  Also divide! (32 bits / 16 bits).
> >Also 2-byte instructions.  Unless you want to do something like:
> >	HL := DEHL DIV (IX+2345H)
> OK, but then You'll loose Z80 compatibility (unless I'm wrong)!
> You might as well take the NS32K, that gives You
> multiply, divide, remainder in all sorts of data types (signed,
> unsigned, float) and sized (byte, word, double, float, long)

Sorry, the Z280 (Z8000) instruction set is a superset of the Z80
instruction set; it's the same sitiuation as using the HD64180 (Z180)!
As far as I know the NS32K never understands any Z80 instruction :-).

Axel Zinser

                             --***%%%%***--
Axel F. Zinser        ...uunet!mcvax!unido!cosmo!fifi       fifi@cosmo.uucp

josef@peun11.uucp (Moellers) (10/27/89)

fifi@cosmo.UUCP (A.F.Zinser) writes:

>In article <577@nixpbe.UUCP> josef@peun11.uucp (Moellers) writes:
>> 
[quote of my reply stating that if You wanted mul/div, You could just
 as well take a 32K machine]

>Sorry, the Z280 (Z8000) instruction set is a superset of the Z80
>instruction set; it's the same sitiuation as using the HD64180 (Z180)!
>As far as I know the NS32K never understands any Z80 instruction :-).

Sorry again, but the Z280 is neither a Z8000 nor a superset of the Z80
(as far as I know, I have a reference man at home and now I'm at work).
The Z80, Z280 and Z8000 are three completely different CPUs.

So, if You're going to use an incompatible CPU, take a good one (NS32K)!


		Josef Moellers

	paper mail:			e-mail:
c/o Nixdorf Computer AG		USA:  uunet!philabs!linus!nixbur!mollers.pad
Abt. DX-PC			!USA: mcvax!unido!nixpbe!mollers.pad
Pontanusstrasse				Phone:
D-4790 Paderborn		(+49) 5251 146245
+-----------------------------------------------------------------------+
| "Many that live deserve death. And some that die deserve life.	|
|  Can You give it to them? Then do not be too eager to deal out	|
|  death in judgement"							|
|			Gandalf to Frodo in "The Fellowship of the Ring"|
+-----------------------------------------------------------------------+