[comp.arch] no-branch 68000 signum, min, max

chris@mimsy.UUCP (Chris Torek) (06/29/88)

In article <7570@boring.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
>(What is signum?)

It has other names, but I prefer signum to avoid confusion with sine, etc.

		  {  -1		x < 0
	signum(x) {   0		x = 0
		  {   1		x > 0

>Not so surprising for min and max (and minmax).  However, they require a
>sign extending shift, so that implementation is not possible on all
>machines.

No, they use clever tricks with the carry bit (these are for unsigned
values).  Indeed, this implementation is not possible on some machines
(VAXen have subtract-with-carry but have no AND instruction; others have
no subtract-with-carry), which is why you want some automatic way of
converting unsigned-min(x,y) into actual code.

>Also, if a branch takes only one cycle (with delay slot),
>you do not gain anything (in general).

On the 68020 the no-branch versions win in terms of total cycles.
min() looks like this:

	| input:  d0=x, d1=y, both unsigned; d2 is scratch
	| output: d0=min(x, y)

		sub.l	d1,d0	| d0 = x-y
		subx.l	d2,d2	| d2 = 111...111 if x<y, else 000...000
		and.l	d2,d0	| d0 = x-y if x<y, else 0
		add.l	d1,d0	| d0 = x-y+y=x if x<y, else y
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

leichter@yale.UUCP (Jerry Leichter) (07/05/88)

It really should be pointed out that these techniques are by no means
new.  I saw them used back around 1971 on a CDC 6600.  On a 6600, branches
were VERY expensive relative to almost anything else.  (The basic unit was
the minor cycle, 100ns.  A Boolean operation was 3 minor cycles; an FP add
was, I think, 8.  A branch was 12 or 14, depending on whether it was taken
or not, plus possible memory latency.)  So techniques like this were very
well developed.  I don't recall ever seeing signum written out this way,
though it's obvious enough, but everyone who knew about good coding practices
knew how to do ABS or MAX with no branches.  (Note that a 6600 is a ones-com-
plement machine, so the code is simpler.)  In fact, the FORTRAN compiler
used these techniques for generating code!

There was also the "take two 60-bit words thought of as 10 6-bit BCD numbers
and add them without a loop" hack which will teach you why there are 26
letters in the English alphabet.  (There have to be.  You reserve 00, then
lay out all the letters, then the digits.  That puts 0 at the only place it
can be to make this trick go through without pre- and post-processing.)
(No, I won't reconstruct the whole thing here.  The trick is to set things
up so that all carries beyond 9 end up setting or clearing a fixed bit in
the 6-bit representation.  A mask extracts those bits, controlling where
the carries have to be "added back in".)
							-- Jerry