[comp.sys.amiga.tech] Adjusting the stack pointer

john13@garfield.MUN.EDU (John Russell) (12/22/88)

Manx-generated code uses the form

addq.w #(4*N),sp
     ^

to reset the stack pointer after calling a function with N (long) args on the 
stack.  Examples which adhere to the syntax of other Amiga assemblers use

addq.l #(4*N),sp
     ^

Is the Manx format dangerous if the stack is being incremented past a
64k boundary? Is there a reason to use one and not the other?

John
-- 
"Version 2.0 is advertised as supporting cursor keys."
	    -- somewhat left-handed endorsement of a Mac word-processor :-)

dbk@fbog.UUCP (Dave B. Kinzer @ Price Rd. GEG) (12/23/88)

In article <5047@garfield.MUN.EDU> john13@garfield.MUN.EDU (John Russell) writes:
>Manx-generated code uses the form
>
>addq.w #(4*N),sp
>     ^
>
>to reset the stack pointer after calling a function with N (long) args on the 
>stack.  Examples which adhere to the syntax of other Amiga assemblers use
>
>addq.l #(4*N),sp
>     ^
>
>Is the Manx format dangerous if the stack is being incremented past a
>64k boundary? Is there a reason to use one and not the other?
>
>John

Hey, something I know about! :-)

 Whenever the 68000 does anything with an address register as the destination,
the source is expanded to 32 bits before the operation is performed, always
in 32 bit mode.  This is true, for example, for add, load, and subtract.
The trick is you need to look in the 68000 manual under ADDA, MOVEA,
and SUBA (etc.)  Most assemblers will automatically change the instruction
generated based upon the destination address.  The book doesn't.

 The Manx generated code is faster (fewer instruction fetches from memory).

"The highly regular structure of the M68000 greatly simplifies the effort
required to write programs in assembly language as well as high level
languages." - M68000 16/32-bit Microprocessor Programmer's Reference 
Manual, fourth edition

-Yea, right.   (Note: sarcasm)

|     // You've heard of CATS and DOGS, I'm from GOATS, Dave Kinzer         |
|    //  Gladly Offering All Their Support!             noao!nud!fbog!dbk   |
|  \X/   "My employer's machine, my opinion."           (602) 897-3085      |

jesup@cbmvax.UUCP (Randell Jesup) (12/23/88)

In article <1667@fbog.UUCP> dbk@fbog.UUCP (Dave B. Kinzer @ Price Rd. GEG) writes:
>In article <5047@garfield.MUN.EDU> john13@garfield.MUN.EDU (John Russell) writes:
>>Manx-generated code uses the form
>>addq.w #(4*N),sp
>>     ^
vs:
>>addq.l #(4*N),sp
>>     ^
> The Manx generated code is faster (fewer instruction fetches from memory).

	Actually no, since addq can only deal with immediates between 1
and 8.  Of course, this is probably another case of the Manx assembler
output being misleading, and the assembler changes that to an adda.w.

-- 
Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup

ewhac@well.UUCP (Leo L. Schwab) (12/23/88)

In article <5047@garfield.MUN.EDU> john13@garfield.MUN.EDU (John Russell) writes:
>Manx-generated code uses the form
>
>addq.w #(4*N),sp
>     ^
>
>to reset the stack pointer after calling a function with N (long) args on the 
>stack.  Examples which adhere to the syntax of other Amiga assemblers use
>
>addq.l #(4*N),sp
>     ^
>
>Is the Manx format dangerous if the stack is being incremented past a
>64k boundary? Is there a reason to use one and not the other?
>
	It turns out that, for ADDQ, specifying .w over .l doesn't buy you
anything; both are 8 clocks.  However, ADDQ only works for immediate values
between 1-8.  Outside that range, you would be inclined to use ADDA.  To use
your notation:

	adda.l	#(4*N),sp	; 16 clocks, 6 bytes

	However, 32K of local variables in a single context is highly
unusual (unless you've got a lot of buffers hanging around, in which case
you should probably be using AllocMem() to get them).  Thus, it is (usually)
safe to say:

	adda.w	#(4*N),sp	; 12 clocks, 4 bytes

	A smart compiler will recognize when the desired offset is greater
than 32K (the offset is signed, remember), and use the ADDA.L when needed.

	It further turns out that you can get even better performance than
that by saying:

	lea	4*N(sp),sp	; 8 clocks, 4 bytes

	This does exactly what the ADDA.W construct above does, only
cheaper.

	The 68000 is full of little non-orthogonal goodies like this.

[Timing information source: MC68000 programming quick-reference card,
available from Motorola.]
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Leo L. Schwab -- The Guy in The Cape	INET: well!ewhac@ucbvax.Berkeley.EDU
 \_ -_		Recumbent Bikes:	UUCP: pacbell > !{well,unicom}!ewhac
O----^o	      The Only Way To Fly.	      hplabs / (pronounced "AE-wack")
"Work FOR?  I don't work FOR anybody!  I'm just having fun."  -- The Doctor

peter@sugar.uu.net (Peter da Silva) (12/23/88)

In article <1667@fbog.UUCP>, dbk@fbog.UUCP (Dave B. Kinzer @ Price Rd. GEG) writes:
> "The highly regular structure of the M68000 greatly simplifies the effort
> required to write programs in assembly language as well as high level
> languages." - M68000 16/32-bit Microprocessor Programmer's Reference 
> Manual, fourth edition

> -Yea, right.   (Note: sarcasm)

Hey, you should try programming on an intel piece of junk some time. Or how
about an 8-bit processor? The only 8-bitter that I know of with a regular
instruction set is the 1802 (the 6809, too, but you demean that by calling
it an 8-bit CPU).

Yes, the 68000 isn't a PDP-11, but neither is it an 80286.
-- 
Peter "Have you hugged your wolf today" da Silva  `-_-'  peter@sugar.uu.net

ewhac@well.UUCP (Leo L. Schwab) (12/25/88)

In article <3142@sugar.uu.net> peter@sugar.uu.net (Peter da Silva) writes:
:In article <1667@fbog.UUCP>, dbk@fbog.UUCP (Dave B. Kinzer @ Price Rd. GEG) writes:
:> "The highly regular structure of the M68000 greatly simplifies the effort
:> required to write programs in assembly language as well as high level
:> languages." - M68000 16/32-bit Microprocessor Programmer's Reference 
:> Manual, fourth edition
:
:> -Yea, right.   (Note: sarcasm)
:
:The only 8-bitter that I know of with a regular
:instruction set is the 1802 (the 6809, too, but you demean that by calling
:it an 8-bit CPU).
:
	How do you feel about the National Semiconductor 32000 series units?
I've heard they're pretty clean.  I've also heard differing opinions about
the Acorn RISC Machine (ARM) chip?  Any thoughts?

_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Leo L. Schwab -- The Guy in The Cape	INET: well!ewhac@ucbvax.Berkeley.EDU
 \_ -_		Recumbent Bikes:	UUCP: pacbell > !{well,unicom}!ewhac
O----^o	      The Only Way To Fly.	      hplabs / (pronounced "AE-wack")
"Work FOR?  I don't work FOR anybody!  I'm just having fun."  -- The Doctor

peter@sugar.uu.net (Peter da Silva) (12/27/88)

In article <10121@well.UUCP>, ewhac@well.UUCP (Leo L. Schwab) writes:
> In article <3142@sugar.uu.net> peter@sugar.uu.net (Peter da Silva) writes:
> :The only 8-bitter that I know of with a regular
> :instruction set is the 1802 (the 6809, too, but you demean that by calling
> :it an 8-bit CPU).

> 	How do you feel about the National Semiconductor 32000 series units?

Well, it's not exactly an 8-bitter :->.

I kind of dismissed the 32000 architecture, because it didn't have an
autoincrement indirect adressing mode, which doubles the cost of doing an
indirect-threaded language like Forth. At the time Forth was my bag. The
best Forth machine, BTW, is the PDP-11 or the 6809 (two instructions each for
the FIG-model inner interpreter) followed by the 68000 (three instructions)
and the 1802 (six instructions, but only 1 byte each). (waffles on at length
about Forth, details on request).

No, I haven't looked at it recently. I really can't say much about it.

> I've heard they're pretty clean.  I've also heard differing opinions about
> the Acorn RISC Machine (ARM) chip?  Any thoughts?

I'd say any RISC had damn well better be orthogonal. Otherwise it's a bit of a
risk calling it one.
-- 
Peter "Have you hugged your wolf today" da Silva  `-_-'  peter@sugar.uu.net

dbk@fbog.UUCP (Dave B. Kinzer @ Price Rd. GEG) (12/28/88)

In article <5047@garfield.MUN.EDU> john13@garfield.MUN.EDU (John Russell) writes:
 >>Manx-generated code uses the form
 >>addq.w #(4*N),sp
 >>     ^
 vs:
 >>addq.l #(4*N),sp
 >>     ^
In article <1667@fbog.UUCP> dbk@fbog.UUCP (Dave B. Kinzer @ Price Rd. GEG) writes:
 The Manx generated code is faster (fewer instruction fetches from memory).

In article <5569@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes:
>	Actually no, since addq can only deal with immediates between 1
>and 8.  Of course, this is probably another case of the Manx assembler
>output being misleading, and the assembler changes that to an adda.w.

   You're right, I missed the 'q' part.  But just for humor, check table F-8
in your Programmers Reference Manual (fourth edition page 210) where it 
says 4 cycles for a word size, 8 for a long.  This must be a misprint since
both instructions perform the same function for address register destinations.


|    // GOATS - Gladly Offering All Their Support  Dave Kinzer (602)897-3085|
|   //  >> In Hell you need 4MB to Multitask!  <<  uunet!nud!fbog!dbk       |
| \X/   #define policy_maker(name) (name->salary > 3 * dave.salary)         |

dbk@fbog.UUCP (Dave B. Kinzer @ Price Rd. GEG) (12/28/88)

I write:
>   You're right, I missed the 'q' part.  But just for humor, check table F-8
									  ^^^
As I put my manual away, it flips to page 190 and table D-5 which is for the
68000 not the 68010 that table F-5 is for.  The instruction timings are correct
here.  Is the '010 faster?  Again, the end result for an address register 
destination is the same.

|    // GOATS - Gladly Offering All Their Support  Dave Kinzer (602)897-3085|
|   //  >> In Hell you need 4Mb to Multitask!  <<  uunet!nud!fbog!dbk       |
| \X/   #define policy_maker(name) (name->salary > 3 * dave.salary)         |

dbk@fbog.UUCP (Dave B. Kinzer @ Price Rd. GEG) (12/28/88)

For a 68010, the instructions:
   addq.w  #d,An
   addq.l  #d,An
Have equivalent timings, as verified on a 68010 machine here at work, regardless
of what the manual says.

Incidentally, for displacements greater than 8 but less than 32768, a
better instruction is:
   lea.l  d(An),An

This instruction works faster than 'add.w #d,An'.


|    // GOATS - Gladly Offering All Their Support  Dave Kinzer (602)897-3085|
|   //  >> In Hell you need 4Mb to Multitask!  <<  uunet!nud!fbog!dbk       |
| \X/   #define policy_maker(name) (name->salary > 3 * dave.salary)         |