[comp.sys.m68k] addq.w #n,sp and a pop quiz

davet@oakhill.UUCP (David Trissel) (12/11/88)

When I put the add.w into the compiler I wondered just how many people
would see it and think that it must be wrong. Craig Ruff had it
right in that the add.w requires 2 less bytes (and is faster) than add.l
for cases where the immediate will fit in a word and where addq cannot
be substituted by the assembler. Specifically the compiler generates this
for popping arguments and so there is never any danger of the immediate 
value being too large. 

Instructions depending on esoteric architectural features such as the above
separate the men from the boys. Here's another one:

There is only one place in the instruction set where a data register is
implicitly sign-extended. Where?

I'll post the answer in a few days (if nobody else does.)

  -- Dave Trissel  Motorola Semiconductor, Austin Texas
if anyone knows the answer to this without 
can answer this one without looking 
This type of architectural usage separate the men from the boys. Here's
another question on a subtlety which I doubt any of you would know without

duggie@Jessica.stanford.edu (Doug Felt) (12/13/88)

In article <1737@oakhill.UUCP> davet@oakhill.UUCP (David Trissel) writes:
>
> [stuff deleted]
>
>There is only one place in the instruction set where a data register is
>implicitly sign-extended. Where?
>

I believe this instruction is CMPA <ea>,An where a word-length value
in the effective address field (possibly a data register) is
sign-extended before being compared with the address register.

>
>  -- Dave Trissel  Motorola Semiconductor, Austin Texas

Doug Felt
Courseware Authoring Tools Project
Sweet Hall 3rd Floor
Stanford University
duggie@jessica.stanford.edu

mikem@uhccux.uhcc.hawaii.edu (Mike Morton) (12/13/88)

Dave Trissel of Motorola recently wrote:
>There is only one place in the instruction set where a data register is
>implicitly sign-extended. Where?

The phrasing of Dave's question implies that he doesn't count the
extension of a word-sized index in xx(An, Dn.w).  More likely he means
a MOVEM.W to data registers.

[No, Dave, I didn't have to hit the manual for this.  It was burned into
my brain a few years back when I spent an afternoon debugging.  Got any
explanation of WHY this is done?]

Here's some puzzles:
  - what's the fastest way to bump two address regs each by one byte?
  - what's faster than an SCS instruction, but equivalent (except
    for CCR, perhaps)

Forgive me if this has been discussed in this group, but has everyone
heard about the "superoptimizer" discussed in a paper in ASPLOS II?
It does really incredible tricks trying MANY instruction sequences to
see which is the best.  One of my favorites is a four-instruction one
to turn Dn into signum(Dn).  Can anyone solve this?  Perhaps better,
can you do Max(Dx,Dy) or Min in four instructions?

 -- Mike Morton // P.O. Box 11378, Honolulu, HI  96828, (808) 676-6966 HST
      Internet: msm@ceta.ics.hawaii.edu
    (anagrams): Mr. Machine Tool; Ethical Mormon; Chosen Immortal; etc.

johnz@lxviiik.uucp (John Zolnowsky) (12/13/88)

From article <1737@oakhill.UUCP>, by davet@oakhill.UUCP (David Trissel):
> There is only one place in the instruction set where a data register is
> implicitly sign-extended. Where?
> 
> I'll post the answer in a few days (if nobody else does.)
>   -- Dave Trissel  Motorola Semiconductor, Austin Texas

Suggested clarification:  What Dave means is that the data register is
a destination operand rather than a source operand.  In the latter case,
there are ADDA/SUBA Dn,An and the use of a data register as a word-sized
index.

					-John Zolnowsky     sun!johnz

sbigham@dukeac.UUCP (Scott Bigham) (12/13/88)

In article <1737@oakhill.UUCP> davet@oakhill.UUCP (David Trissel) writes:

>There is only one place in the instruction set where a data register is
>implicitly sign-extended. Where?

Perchance in the addressing mode OFF(A0,D3.W)?

						sbigham
-- 
Scott Bigham                         "The opinions expressed above are
Internet sbigham@dukeac.ac.duke.edu   (c) 1988 Hacker Ltd. and cannot be
USENET   sbigham@dukeac.UUCP          copied or distributed without a
...!mcnc!ecsgate!dukeac!sbigham       Darn Good Reason."

jlyle@hpdml93.HP.COM (Jeff Lyle) (12/13/88)

> Instructions depending on esoteric architectural features such as the above
> separate the men from the boys. Here's another one:

Or working code from code with features???

> There is only one place in the instruction set where a data register is
> implicitly sign-extended. Where?
>
>   -- Dave Trissel  Motorola Semiconductor, Austin Texas

The MOVEM.W instruction has the rather NASTY feature of sign extending
its data when the destination is a register.  Yes it IS documented in
the fine print.  No, I did not read the fine print before discovering
this!

  - Jeff Lyle (hpdmlac)

chrisj@cup.portal.com (Christopher T Jewell) (12/13/88)

In article <1737@oakhill.UUCP>, davet@oakhill.UUCP (David Trissel) writes:

>There is only one place in the instruction set where a data register is
>implicitly sign-extended. Where?

Aw come on!  Three cases at least:
1.	ADDA.W	Dn,Am		(also SUBA.W, CMPA.W, MOVEA.W)
2.	d8(xx,Dn.W) addressing	(xx IN {PC, An})
3.	MOVEM.W	 memory to dataregs

I suppose that you intended #3, since the values loaded INTO the DRegs are
actually sign-extended during loading; in the other two cases, the value
taken FROM a DReg is sign-extended, but the reg itself is unchanged.  (The
wording of the question in your pop-quiz was ambiguous, professor! :-) )

Christopher T. Jewell   chrisj@cup.portal.com   sun!cup.portal.com!chrisj
"Just because I tell only the truth doesn't mean I tell the only truth there
is."  Father McAleer in Brad Ferguson's "To Tell the Troof" (Jan '89 F&SF)

paul@taniwha.UUCP (Paul Campbell) (12/13/88)

In article <1737@oakhill.UUCP> davet@oakhill.UUCP (David Trissel) writes:
>There is only one place in the instruction set where a data register is
>implicitly sign-extended. Where?

That's easy ... no need to look at the manual - 'movem.w .., <d0>'
on the other hand I've never ever come across a need for this instruction,
I have always guessed it was something some big customer asked Mot. to
put in the 68k early on when everyone thought of it as a 16 bit machine.

	Paul

-- 
Paul Campbell			..!{unisoft|mtxinu}!taniwha!paul (415)420-8179
Taniwha Systems Design, Oakland CA

 	"Read my lips .... no GNU taxes"

tmt@xpiinc.UU.NET (Thomas M. Talpey) (12/14/88)

>> Dave Trissel asks...
>> There is only one place in the instruction set where a data register is
>> implicitly sign-extended. Where?

Movemw to data regs. Is this because of microcode streamlining so the
processor doesn't have to figure out whether it's an address or data
reg regarding extension? The practical reasons seem marginal. Also, why
does movem overshoot by one memory cycle? Inquiring minds want to know.

>> Mike Morton asks:

Hey, you're cheating (I think!).

>>   - what's the fastest way to bump two address regs each by one byte?

	cmpmb	ax@+,ay@+	| 12 cycles

But it adds one memory cycle over two addql's (16 cycles). This may or may
not make it faster.

>>   - what's faster than an SCS instruction, but equivalent (except
>>     for CCR, perhaps)

	subxb	dx,dx		| 4 cycles

But it assumes that the X bit is set with CY, only sometimes true!

>>   - can you do Max(Dx,Dy) or Min in four instructions?

	movl	Dx,result
	cmpl	Dx,Dy
	bles	.+4
	movl	Dy,result

Have you got a better way?

Tom Talpey
tmt@xpiinc.uu.net

bruce@blue.blue.gwd.tek.com (Bruce Robertson) (12/14/88)

> There is only one place in the instruction set where a data register is
> implicitly sign-extended. Where?

I see two, if you are allowing the 68020 instruction set.  The one
you're probably looking for is "moveq".  The other is "bfexts", which
I suppose may be said to be explicit because of the 's' in the mnemonic.
--
--
	Bruce Robertson
	bruce@blue.gwd.tek.com

mikem@amc.UUCP (Mike McGinnis) (12/15/88)

In article <4350@Portia.Stanford.EDU>, duggie@Jessica.stanford.edu (Doug Felt) writes:
> In article <1737@oakhill.UUCP> davet@oakhill.UUCP (David Trissel) writes:
> >
> > [stuff deleted]
> >
> >There is only one place in the instruction set where a data register is
> >implicitly sign-extended. Where?
> >
> 
> I believe this instruction is CMPA <ea>,An where a word-length value
> in the effective address field (possibly a data register) is
> sign-extended before being compared with the address register.
> 
> >
> >  -- Dave Trissel  Motorola Semiconductor, Austin Texas
> 
> Doug Felt
> Courseware Authoring Tools Project

The CMPA instruction implies that the destination operand is an address 
register. 
The only place a data register will be extended to a long is the MOVEQ
instruction. The source operand must be no more than 3 bits (37.5 cents :),
and the value gets placed in the destination operand (data register),
then extended to a long.
Do I win a cupie doll???
	Michael E. McGinnis
	Applied Microsytems Corp.

"You wake up lost in an empty town, wondering why no one else is around"
"Look up to see a giant boy - you've just become his brand new toy..."

mikem@uhccux.uhcc.hawaii.edu (Mike Morton) (12/15/88)

Tom Talpey is right that the CMPM.B does more memory access.  But it *is* the
shortest way, and fewest cycles.  I've never actually used this trick, so I
don't know if it's empirically faster.  [I use a Mac Plus, which has funny
timing anyway...]

I also bow to Tom's point that the Carry and eXtend bits must be the same
for "SUBX.B Dn,Dn" to replace "SCS".  Incidentally, I got this trick from
an article Bill Gates wrote ca 1975 on 8080 (?) hacking, reprinted in
"Programmers at Work".

Counting the cycles in Tom's Max(Dx,Dy) as a function of word or long operands
and whether the branch takes/doesn't.

                                 word               long
	mov 	Dx,result        4                  4
	cmp 	Dx,Dy            4                  6
	bles	.+4                10/8 (taken/not)
	mov 	Dy,result        4                  4
                                 18/20              20/22

The Superoptimizer's solution has no branches at all, so its time is constant
at 16 cycles for word operands (small win) or 32 for longword (big loss).

Anyone care to solve the 4-instruction signum function?

 -- Mike Morton // P.O. Box 11378, Honolulu, HI  96828, (808) 676-6966 HST
      Internet: msm@ceta.ics.hawaii.edu
    (anagrams): Mr. Machine Tool; Ethical Mormon; Chosen Immortal; etc.

gz@spt.entity.com (Gail Zacharias) (12/15/88)

In article <2833@uhccux.uhcc.hawaii.edu> mikem@uhccux.uhcc.hawaii.edu (Mike Morton) writes:
>Anyone care to solve the 4-instruction signum function?

	add Dx,Dx
	subx Dx,Dx
	bls .+4
	 moveq #1,Dx
--
gz@entity.com					...!mit-eddie!spt!gz
	  Unix: when you can't afford the very best.

aglew@mcdurb.Urbana.Gould.COM (12/17/88)

Ref: Superoptimizer -- a look at the smallest program
     Henry Massalin
     ASPLOS II

Provides a number of code sequences that avoid branches.
Branch avoidance becomes important on high performance
processors.

signum(x)
	(x in d0)
	add.l	d0,d0
	subx.l	d1,d1
	negx.l	d0
	addx.l	d1,d1
	(signum x in d1)

abs(x)
	(x in d0)
	move.l	d0,d1
	add.l	d1,d1
	subx.l	d1,d1
	eor.l	d1,d0
	sub.l	d1,d0
	(abs(x) in d0)
Longer but no jumps. Unfortunately, dependencies between nearly every
instruction, but a good code scheduler may be able to interleave
other code here. We have a good code scheduler, don't we, Dave? :-)

Max(X,Y)
	(d0=X,d1=Y)
	sub.l	d1,d0
	subx.l	d2,d2
	or.l	d2,d0
	addx.l	d1,d0
	(d0 = max(X,Y)

Min is similar.

Simultaneous max/min

MaxMin(X,Y)
	(d0=X, d1=Y)
	sub.l	d1,d0
	subx.l	d2,d2
	and.l	d0,d2
	eor.l	d2,d0
	add.l	d1,d0
	add.l	d2,d1
	(d0 = max, d1 = min)

Massalin goes on to present some BCD code fragments, multiplication
by constants, etc.

I don't think that I have done an injustice to Massalin's paper by
presenting these fragments -- many of them were known before his
paper, by people like me who like playing with branch avoiding
code sequences. Some were already well known tricks.
    Massalin's contribution is in having written a program that
finds the best such sequence, with reasonable certainty. Ie. he has
automated the search process that hackers used to do by inspiration.
    Superoptimizer is probably not the sort of thing that you would
put into a production compiler, usually - most code simply does not
benefit that much from it, plus, it is extremely slow. But,
something like superoptimizer should almost definitely be run by
a compiler group that is generating code fragments, for code
generation or libraries, every time the machine parameters change
(like, when a new chip comes out). Unless your machine is simple
enough that the best code sequence is obvious by inspection.

Andy "Krazy" Glew   aglew@urbana.mcd.mot.com   uunet!uiucdcs!mcdurb!aglew
   Motorola Microcomputer Division, Champaign-Urbana Design Center
	   1101 E. University, Urbana, Illinois 61801, USA.
   
My opinions are my own, and are not the opinions of my employer, or
any other organisation. I indicate my company only so that the reader
may account for any possible bias I may have towards our products.

mikem@uhccux.uhcc.hawaii.edu (Mike Morton) (12/17/88)

Gail Zacharias recently contributed this 4-instruction signum function, for
which I've counted cycles.

                    word                 long
    add Dx,Dx       4                    8  ; shift sign bit -> carry
                                            ; set Z bit only for Dx = 0, 0x8000
    subx Dx,Dx      4                    8  ; set Dx = 0xffff if Dx was < 0
                                            ;    (and set C bit)
                                            ; set Dx = 0 if sign Dx was >= 0
                                            ;    (and clear C bit)
                                            ;    (and leave Z bit alone)
    bls .+4               10/8 (taken/not)  ; branch on C bit (Dx < 0)
                                            ;    (Dx now holds -1)
                                            ; ... or on Z bit (Dx = 0, 0x8000)
                                            ;    (Dx now holds 0)
    moveq #1,Dx     4                    4  ; none of the above: return 1
                 ----                 ----
                18/20                26/28

Very elegant!  It uses only one register and is faster than the
Superoptimizer's solution for the longword case (30 cycles).  However,
the SO's solution is faster for the word case (16 cycles), mostly because
it avoids the conditional branch.  Takers?

 -- Mike Morton // P.O. Box 11378, Honolulu, HI  96828, (808) 676-6966 HST
      Internet: msm@ceta.ics.hawaii.edu
    (anagrams): Mr. Machine Tool; Ethical Mormon; Chosen Immortal; etc.

john@frog.UUCP (John Woods) (12/20/88)

In article <1737@oakhill.UUCP>, davet@oakhill.UUCP (David Trissel) writes:
> 
> Instructions depending on esoteric architectural features such as the above
> separate the men from the boys. Here's another one:
> 
When I first glanced at my screen, I read the above as
"separate the men from the bugs."  I was going to post a comment about
such trivia JOINING the men and the bugs, but decided not to.

> There is only one place in the instruction set where a data register is
> implicitly sign-extended. Where?
> 
And when I saw various people's complaints about how they discovered the
non-intuitive solution to this pop quiz the hard way, I decided to go ahead
and post it anyway.

The answer to such silliness seems to be RISC.
-- 
John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

Go be a `traves wasswort.		- Doug Gwyn

davet@oakhill.UUCP (David Trissel) (12/21/88)

In article <1324@X.UUCP> john@frog.UUCP (John Woods) writes:

>The answer to such silliness seems to be RISC.

Huh? The "leave it to the software" philosophy? No thanks. I happen to know
people trying to "work around" some tough RISC hangups and I have to say that
if RISC compilers are easier to build than CISC compilers it's not 
by much.

Certainly by now the RISC compilers would all be "finished" wouldn't they?
Since the initial implementation would be a breeze all that's left to do
is work on optimizing.

Looks to me like both RISC and CISC have a nearly equal amount of
implementation headaches.

 -- Dave Trissel 

jsan@well.UUCP (Jez San) (12/25/88)

Where can I get the SuperOptimiser from??

From the quoted examples, it sure looks like a neat toy, and I am keen
to feed it something more useful than just Mins and Maxs!

-- Jez.

[no siggy yet... im new to this!]

mikem@uhccux.uhcc.hawaii.edu (Mike Morton) (12/30/88)

Jez San recently wrote:

> Where can I get the SuperOptimiser from??
> From the quoted examples, it sure looks like a neat toy, and I am keen
> to feed it something more useful than just Mins and Maxs!

It's not distributed, to the best of my knowledge.  The reference is in
ASPLOS II:
    "Superoptimizer -- A Look at the Smallest Program", by Henry Massalin

If you know how to order stuff from ACM, the magic info from the first page is:
    (c) 1987 ACM 0-89791-238-1/87/1000-0122 $00.75

I wrote to Massalin a couple of times and he was quite helpful with questions.
I don't know a net address for him, but the address in the paper is:
    Department of Computer Science, Columbia University, New York, NY  10027

 -- Mike Morton // P.O. Box 11378, Honolulu, HI  96828, (808) 676-6966 HST
      Internet: msm@ceta.ics.hawaii.edu
    (anagrams): Mr. Machine Tool; Ethical Mormon; Chosen Immortal; etc.

bga@raspail.UUCP (Bruce Albrecht) (12/31/88)

In article <2899@uhccux.uhcc.hawaii.edu>, mikem@uhccux.uhcc.hawaii.edu (Mike Morton) writes:
> It's not distributed, to the best of my knowledge.  The reference is in
> ASPLOS II:
>     "Superoptimizer -- A Look at the Smallest Program", by Henry Massalin
> 
> If you know how to order stuff from ACM, the magic info from the first page is:
>     (c) 1987 ACM 0-89791-238-1/87/1000-0122 $00.75

What's ASPLOS II?  Some conference proceedings?

mikem@uhccux.uhcc.hawaii.edu (Mike Morton) (12/31/88)

Bruce Albrecht writes:
> What's ASPLOS II?  Some conference proceedings?

Sorry.  It's a conference (#2 was in 1988), short for Architectural Support
for Programming Languages and Operating Systems.  More hardware than software,
if I recall correctly.

It's sort of formal presentations of hackery, but people with advanced
degrees aren't supposed to admit that they do weird stuff like this, let
alone enjoy it.

 -- Mike Morton // P.O. Box 11378, Honolulu, HI  96828, (808) 676-6966 HST
      Internet: msm@ceta.ics.hawaii.edu
    (anagrams): Mr. Machine Tool; Ethical Mormon; Chosen Immortal; etc.