[comp.sys.apple2] ML subroutines

acmfiu@serss0.fiu.edu (ACMFIU) (04/24/91)

i'm curious as to how some of you pass parameters to ML routines in your
ML code. for instance, let's say you had a routine that took two numbers
and returned the answer. how would you pass the two routines. i've basically
passed values two ways: 1) put them in some dummy variable, 2) if the
subroutine required many parameters, then i'd just pass the address of
where to find the data.

i'm just looking for efficiency and readability. #1 above is definitely
not readable. but #2 is. does anyone pass parameters via the stack?

albert

rhyde@musial.ucr.edu (randy hyde) (04/24/91)

There are many different ways to pass parameters in assembly language.

1) Use the registers.  This is the fastest, shortest, and easiest way to
do it.  For many years you could guarantee that your assembly language
programs ran *much* faster than C by passing parameters in the
registers.  Today, C compilers have wised up and *they've* started
passing parameters in the registers as well.

2) Pass parameters in global (esp direct page on 65xxx) memory
locations.  From a software engineering point of view, one might claim
that using global variables is bad, but if the variables are used only
for these particular parameters, there won't be a problem.  Set up is a
little more tedious and time consuming than registers, but still really
fast.  Note that this technique gets really sticky and inefficient if
your code must be reentrant.

3) Pass parameters in the code stream.  This works really great for
constant parameters (including addresses of variables).  For example, I
have a printf routine you call as follows:

		jsr	printf
		byt	"This is a test of printf,  i=%d, ch=%c\n",0
		adr	i,ch

In this example I've passed the parameters to printf as bytes
immediately following the jsr.  The printf routine pops the return
address which points at the beginning of the parameters.  This is a
convenient and fast way to pass data, but it only works for parameters
which never change (note, however, that the values of I and CH above
could change, but they're not the real parameters here, the addresses of
I and CH are the parameters and they don't change.)  I am assuming here,
of course, that self-modifying code is unacceptable to you.  If you want
to use self-modifying code, you can modify the parameter lists following
the call.  Not recommeded though.

4) Pass parameters on the stack.  This is how most HLLs pass parameters.
It is slow, big, and accessing the parameters is inconvenient,
especially on processors like the 65816 which don't have a frame pointer
register.  Nonetheless, there are some advantages to this technique: 
It's easy to understand, easy to verify, and naturally supports
recursion and reentrancy.  It supports any number of parameters (unlike
registers) and is easily expanded.

5)  Pointers to parameter blocks.  This is similar to passing stuff in
the code stream except you pass an explicit pointer rather than using
the return address as the pointer.  Since you're passing a pointer, the
parameter block can be in RAM so this doesn't have the self-modifying
code problem.  This technique is really good if you can anticipate most
of the values you're going to pass to each call.  It works really well
if you pass the same set of parameters to various routines or to various
instances (calls) of the same routine.

toddpw@nntp-server.caltech.edu (Todd P. Whitesel) (04/25/91)

rhyde@musial.ucr.edu (randy hyde) writes:

>4) Pass parameters on the stack.  This is how most HLLs pass parameters.
>It is slow, big, and accessing the parameters is inconvenient,
>especially on processors like the 65816 which don't have a frame pointer
>register.

What the ??

There IS a stack-register-relative addressing mode. Or you can do what
Orca does: hop the stack pointer down to account for your autos, push the
D register, and set D to the bottom of the auto space. This lets you use
EVERYTHING on your autos and arguments (if the autos are larger than 256
bytes then it gets somewhat slower though), and is a lot more convenient
(for large programs) than trying to cram all your globals in the direct
page and use stack relative for autos and arguments.

Randy, I sometimes get the distinct feeling that you rag on the 65816
because you're trying to use it as if it were just a 6502 with bigger
registers, and gee things don't work too well that way...

Todd Whitesel
toddpw @ tybalt.caltech.edu

meekins@anaconda.cis.ohio-state.edu (Tim Meekins) (04/25/91)

In article <13845@ucrmath.ucr.edu> rhyde@musial.ucr.edu (randy hyde) writes:
>There are many different ways to pass parameters in assembly language.
>
[snip]

>4) Pass parameters on the stack.  This is how most HLLs pass parameters.
>It is slow, big, and accessing the parameters is inconvenient,
>especially on processors like the 65816 which don't have a frame pointer
>register.  Nonetheless, there are some advantages to this technique: 
>It's easy to understand, easy to verify, and naturally supports
>recursion and reentrancy.  It supports any number of parameters (unlike
>registers) and is easily expanded.
>

Hard to do on a GS? it's quite simple actually if you map the the direct page
to the stack. So parameter 1 becomes dP location 0, 2 become 2, 3 becomes 4,
etc (assuming words and no local variables and so forth). This remapping
also allows local variables to used more efficiently. In fact, subroutines 
written this almost always tend to use direct page variables more often than
absolute addressing variables. This large use of the DP more than makes up
for the small overhead of setting up the stack. Although the set up code
may look nasty to the meek, it is quite simple and macros exist for automating
it. For example, my macro library has a macro called 'subroutine' which works
similar to Orca/M's 'subroutine' macro. All you do is define your local
varibales using equates then invoke the subroutine macro, listing each
parameter and their lengths. It almost looks like a Pascal procedure
definition.

--
+---------------------------S-U-P-P-O-R-T-----------------------------------+
|/ Tim Meekins                  <<>> Snail Mail:           <<>>  Apple II  \|
|>   meekins@cis.ohio-state.edu <<>>   8372 Morris Rd.     <<>>  Forever!  <|
|\   timm@pro-tcc.cts.com       <<>>   Hilliard, OH 43026  <<>>            /|

MQUINN@UTCVM.BITNET (04/25/91)

On Wed, 24 Apr 91 04:24:38 GMT ACMFIU said:
>
>i'm just looking for efficiency and readability. #1 above is definitely
>not readable. but #2 is. does anyone pass parameters via the stack?

Yes... That's probably the best way to do it.  If you do any toolbox
programing on the GS, that's the way you pass parameters to the tools...
but when passing to your own routines, be absolutely positive you pull
off the same amount you pushed!

>albert

----------------------------------------
  BITNET--  mquinn@utcvm    <------------send files here
  pro-line-- mquinn@pro-gsplus.cts.com

stadler@Apple.COM (Andy Stadler) (04/25/91)

In article <13845@ucrmath.ucr.edu> rhyde@musial.ucr.edu (randy hyde) writes:

>4) Pass parameters on the stack.  This is how most HLLs pass parameters.
>It is slow, big, and accessing the parameters is inconvenient,
>especially on processors like the 65816 which don't have a frame pointer
>register.  Nonetheless, there are some advantages to this technique: 
>It's easy to understand, easy to verify, and naturally supports
>recursion and reentrancy.  It supports any number of parameters (unlike
>registers) and is easily expanded.

Actually, this is a pretty GOOD way to pass parameters.  Not only does it
support recursion and reentrancy (as you mentioned) but it also has the
_advantage_ of being the method used by HLL's.  This is great for two reasons.
First, it makes it much easier to freely mix asm and high-level, with routines
calling each other and not caring what language is used;  Second, since you
have to use this method to call the toolbox, why not use it everywhere for
consistency?

I disagree with your assessment that it's big and slow.  There are two tech-
niques which make it quite efficient and useable.  The first is to use the
Pascal style of never passing any items greater than 4 bytes long.  If a
parameter is longer than 4 bytes, push a pointer to it.  This is an area where
Pascal has an edge over C because C is always pushing and pulling huge streams
of bytes on and off the stack (especially when working with strings).  Second,
the 65816 -does- have a frame pointer register - the DP register.  Since the
stack and direct page can freely overlap, a procedure can set up a frame pointer
with the following steps:

  1.  save old D (push it)
  2.  calc new D based on stack pointer
  3.  set D to new value

Now the direct page provides quick access to the parameters pushed by the
caller.  In fact there are a couple of bonuses- first, by clever manipulation
of the amount you subtract from D, you can also create local, private  scratch 
storage.  Additionally, because you've placed the parameters in direct page,
any pointers which were passed can immediately be used for indirect accesses.
This is a _distinct_ advantage over the register-based and globals-based
methods you discussed.

But wait, you say, this all sounds kinda complicated.  You're right.  But
with a little work, one can create assembler macros which do all this and more,
automatically.  For example, the macros I used for assembly subroutines in
HyperCard IIGS analyze the sizes of parameters, function results, and local
variables, and pick the best optimized code for each combination;  generate
code to manipulate the SP and DP registers;  support optional functions such
as save/restore B register (globals data bank);  and last but _definitely_ not
least, create EQUATES for every single parameter and local variable.  

So how can you get great macros like these?  Tell you what, how about if I
post them!  See next posting after this one....

Andy Stadler
Apple Computer, Inc.

stadler@Apple.COM (Andy Stadler) (04/25/91)

The following is a little gift for you if you are writing reasonably
complex programs in the 16-bit world.  See my previous posting for more
information about -why- you'd want to use these.

The macros, as posted, are written for MPW IIGS.  They don't really use
any fancy macro features and should be easy to port to APW or other assemblers.
In addition, there are a few features which could be easily added depending on
your needs.  Possibilities include support for C-style calls (don't pop the
parameters) and JSR calls (currently wired for JSL and RTL).

I hope these are useful to you in your programming.

Andy Stadler
Apple Computer, Inc.


;---------------------------------------------------------------------
;
;  file  stackframe.macros
;
;  This file contains stack frame code, originally designed by Darryl
;  Lovato and further modified by Andy Stadler.  This code creates
;  "best optimized" entry and exit code for pascal-style calling
;  sequences.
;
;  Created:	19-Apr-88
;  Modified:	See Mod History Below
;  Author:	Andy Stadler
;
;  Copyright (C) Apple Computer, Inc. 1988-1991
;  All Rights Reserved.

;---------------------------------------------------------------------
;
;  Modification History
;
;  19-Apr-88	ADS	Original release
;  20-May-88	ADS	Modified, now there are only 2 cases
;  23-May-88	ADS	Fixed bug in small frame exit case
;  24-May-90	ADS	Added "big locals" support - for strings, etc

;---------------------------------------------------------------------
;
;  Pascal defines a stack frame with the following structure:
;
;	|		|
;	|---------------|
;	|		|
;	|  func result  |	"fsize" 
;	|		|
;	|---------------|
;	|		|
;	|  parameters   |	"psize"
;	|		|
;	|---------------|
;	|		|
;	|      RTL      |	3 bytes
;	|		|
;	|---------------|
;	|		|
;	|  saved D reg	|	2 bytes
;	|		|
;	|---------------|
;	|		|
;	|  local vars	|	"lsize"
;	|		|
;	|---------------|
;   D ->|		|
;	|  big locals	|	"bsize"
;	|		|
;	|---------------|
;  SP ->|		|
;	|		|
;
;
;  These macros generate code to save the old D register, create
;  local variable space (if requested), and point D and the stack
;  pointer to the appropriate stack offsets.  In addition, equates
;  are automatically generated to all variables within the stack
;  frame.
;
;  Using the macros to generate stack frames is quite simple.  The 
;  structure of every procedure/function is as follows:
;
;	ProcName	PROC
;			InitFrame
;			FrameType	flags
;	
;	biglocal1..n	BigLocal	types
;	local1..n	Local		types
;	param1..n	Param		types
;	result		Result		type
;
;			FrameEntry
;
;			{ user code }
;
;			FrameExit
;			ENDPROC
;
;  NOTE that the data must be declared in that order, and it is exactly
;  from bottom to top.  In addition, you must list the parameters in
;  exactly backwards order from the pascal source (see example below).
;
;  The following data types are defined:
;
;	Str255		256 bytes
;	Rect		8
;	Handle		4
;	Ptr		4
;	LongInt		4
;	Integer		2
;	Char		2
;	Byte		2
;	Boolean		2
;
;  Any other size variable may be defined simply by giving the number
;  of bytes to reserve.  Note that Char, Byte, and Boolean reserved two
;  bytes apiece;  this follows the Pascal calling conventions.  For local
;  storage, single byte variables may be declared with size 1 (but they may
;  only be stored to in 8 bit mode).  For code efficiency, however, it's
;  usually easier to just store them in two byte spaces.
;
;  The ExitCode routine includes the final RTL instruction;  none is needed.
;
;  Only one FrameType flag is currently defined:  SaveB will cause the data
;  bank register to be saved and restored.  This allows the procedure to access
;  data other than the standard globals page.  If no flags are set, the
;  FrameType instruction is not needed.
;
;  BigLocals should only be used when you have more than about 250 bytes of
;  stack frame (this means everything from the bottom of the locals up to
;  the top of the function result space.  This will most often be caused by use
;  of a pascal string as a local variable.  Because BigLocals are "below" the
;  true direct page, you can not access them directly.  Instead you can use
;  one of two techniques.  You can calculate the address and work with a
;  pointer, or you can take advantage of the dp,X mode, which WRAPS within bank
;  zero, and effectively allows use of negative offsets.  The equates generated
;  for BigLocals are therefore negative numbers.  Here are examples of each
;  technique:
;
;  Pushing a pointer to a BigLocal		Using Negative Indexing
;  -------------------------------		-----------------------
;
;	pea	0000	; push hi word		ldx	#bigloc	; negative #
;	tdc		; calc lo word		lda	0,x
;	clc					lda	2,x	; hardcoded
;	adc	#bigloc	; negative #		lda	10,x	; record
;	pha					lda	20,x	; offsets!
;
;  A nice side effect is that the macros restore the stack pointer, no matter
;  what its state when ExitCode is reached.  This was designed mainly for the
;  pascal compiler so it could compile EXIT(proc) easily;  It can be used
;  similarly from assembly.  You don't need to clean up when done because it
;  will be fixed for you!
;
;---------
;
;  The following example demonstrates the use of the macros.
;
;  FUNCTION MyFunct(address: Ptr; count: Integer) : Handle;
;  VAR
;    myFlag:	Boolean;
;    myWord:	Integer;
;    myRect:	Rect;
;    myArray:	ARRAY[1..10] OF INTEGER;
;    myString:	Str255;
;
;    BEGIN
;      myWord := myRect.top;		{ a few samples to show data access }
;      myWord := address^[count];	
;      MyFunct := NIL;
;    END;
;
;  In assembly:
;
;	MYFUNCT		FUNC			; FUNC and PROC are synonyms
;			InitFrame
;			FrameType	SaveB
;
;	myString	BigLocal	Str255
;	myArray		Local		20	    ; 10 integers is 20 bytes
;	myRect		Local		Rect
;	myWord		Local		Integer
;	myFlag		Local		Boolean	    ; 2 bytes
;	count		Param		Integer
;	address		Param		Ptr
;	theResult	Result		Handle
;
;			EntryCode
;
;			lda	<myRect+top	; all locals are direct page
;			sta	<myWord
;
;			ldy	<count
;			lda	[<address],y	; etc, etc
;			sta	<myWord
;
;			pea	0000		; call string function
;			tdc
;			clc
;			adc	#myString	; negative number....
;			pha
;			_DrawString
;
;			stz	<theResult	; writing to function result
;			stz	<theResult+2
;
;			ExitCode		; (this macros includes the RTL)
;			ENDFUNC			; synonymous with ENDPROC
;
;-------------------------------------------------------------------------

;---------------------------------------------------------------------
;
;  Stack Frame Creation Macros
;
;  The following macros are used to initialize SET variables for
;  subsequent stack frame macros.

		MACRO
		InitFrame

Str255		set	256		; Data type sizes
Rect		set	8
Handle		set	4
Ptr		set	4
LongInt		set	4
Integer		set	2
Char		set	2
Byte		set	2
Boolean		set	2

qBSize		set	0		; # bytes of big locals
qLSize		set	0		; # bytes of locals
qPSize		set	0		; # bytes of parameters
qFSize		set	0		; # bytes of function result
qSaveB		set	0		; 1= save the data bank register

		MEND

;---------------------------------------------------------------------
;
;  Frame Options
;
;  The FrameType macro allows the user to choose options within the
;  stack frame.  The only currently defined option is SaveB which will
;  cause the data bank register to be preserved and restored.

		MACRO
		FrameType	&option
		
		IF &option = 'SaveB' THEN
qSaveB		set	1
		ELSE
		  Error:  Unknown_frame_option
		ENDIF
		
		MEND


;---------------------------------------------------------------------
;
;  Local Storage Macros
;
;  The following macros define labels for local stack frame variables
;  (big and small), stack based parameters, and function results.

		MACRO
&lab		BigLocal	&val
qBSize		set	qBSize+&val
&lab		equ	-qBSize+1

		IF qLSize <> 0 THEN
		  Error:  BigLocal_declaration_after_a_Local
		ENDIF
		IF qPSize <> 0 THEN
		  Error:  BigLocal_declaration_after_a_Param
		ENDIF
		IF qFSize <> 0 THEN
		  Error:  BigLocal_declaration_after_a_Result
		ENDIF
	
		MEND

	

		MACRO
&lab		Local	&val
&lab		equ	qLSize+1
qLSize		set	qLSize+&val

		IF qPSize <> 0 THEN
		  Error:  Local_declaration_after_a_Param
		ENDIF
		IF qFSize <> 0 THEN
		  Error:  Local_declaration_after_a_Result
		ENDIF
	
		MEND

	

		MACRO
&lab		Param		&val		; Stack based parameter

&lab		equ		qLSize+6+qSaveB+qPSize
qPSize		set		qPSize+&val

		IF qFSize<>0 THEN
		  Error:  Param_declaration_after_a_Result
		ENDIF
	
		MEND


		MACRO
&lab		Result		&val

		IF qFSize <> 0 THEN
		  Error: Duplicate_Result
		ENDIF

&lab		equ		qLSize+6+qSaveB+qPSize
qFSize		set		&val
		MEND


;---------------------------------------------------------------------
;
;  Entry Code Generation Macro
;
;  The EntryCode macro generates code to create the stack frame.  This
;  code saves the D and (optionally the B) register, creates space for
;  the local variables, and sets the D register and the Stack Ptr to 
;  just below this area (so the first local variable is at address 1).
;


		MACRO
		EntryCode
		longa	on
		longi	on

		IF	qSaveB = 1 THEN
		phb
		ENDIF
		
		phd

		IF	qLSize <= 8 THEN

Qtemp		SET	qLSize		; small # locals, PHY to make space
		WHILE	Qtemp > 0 DO
		phy
Qtemp		SET	Qtemp - 2
		ENDWHILE
		tsc
		
		ELSE
		tsc			; large # locals, calculate new SP
		sec
		sbc	#qLSize

		ENDIF
		
		tcd			; set the direct page
		
		IF	qBSize = 0 THEN
		
		IF	qLSize > 8 THEN
		tcs			; no bigs, large locals
		ENDIF
		
		ELSE
		
		sec			; split off the stack for bigs
		sbc	#qBSize
		tcs
		
		ENDIF

		MEND


;---------------------------------------------------------------------
;
;  Exit Code Generation Macro
;
;  The following macro generates the appropriate exit code for a
;  subprogram.  The stack pointer, D reg, and optionally B reg are
;  restored, and an RTL is executed.
;

		MACRO
&lab		ExitCode

&lab		longa	on
		longi	on
		
		IF	qPSize <> 0 THEN	; move RTL & B up over parms
		lda	<qLSize+4+qSaveB
		sta	<qLSize+4+qSaveB+qPSize
		lda	<qLSize+3
		sta	<qLSize+3+qPSize
		ENDIF

;  If LSize + PSize <= 12 or LSize <= 2 then do smallframe.  Else do largeframe.

		IF	qLSize+qPSize <= 12 GOTO .SMALLFRAME
		IF	qLSize > 2	    GOTO .LARGEFRAME

;  This frame has a small number of locals, zero or two bytes worth.
;  We can pull the D reg right off the stack, and then either pull or
;  calculate to remove the call parameters.

.SMALLFRAME	tdc
		tcs

Qtemp		SET	qLSize			; use ply's to kill locals
		WHILE	Qtemp > 0 DO
		ply
Qtemp		SET	Qtemp - 2
		ENDWHILE
		
		pld				; recover direct page register
		IF	qPSize <= 8 THEN

Qtemp		SET	qPSize			; small # parms, use ply's
		WHILE	Qtemp > 0 DO
		ply
Qtemp		SET	Qtemp - 2
		ENDWHILE

		ELSE				; large # parms, use clc adc
		clc
		adc	#2+qLSize+qPSize
		tcs
		ENDIF
		GOTO	.DONE


;  This frame has a large amount of stack data.  We use a load to get to the
;  saved D register, and dump the stack data by calculating a new stack pointer.

.LARGEFRAME	ldx	<1+qLSize
		tdc
		clc
		adc	#2+qLSize+qPSize
		tcs
		txa
		tcd
		
.DONE		IF	qSaveB = 1 THEN
		plb
		ENDIF
		rtl

		MEND

psonnek@pro-mansion.cts.com (Patrick Sonnek) (04/26/91)

In-Reply-To: message from acmfiu@serss0.fiu.edu

>i'm curious as to how some of you pass parameters to ML routines in your
>ML code. for instance, let's say you had a routine that took two numbers
>and returned the answer. how would you pass the two routines. i've basically
>passed values two ways: 1) put them in some dummy variable, 2) if the
>subroutine required many parameters, then i'd just pass the address of
>where to find the data.

>i'm just looking for efficiency and readability. #1 above is definitely
>not readable. but #2 is. does anyone pass parameters via the stack?


The stack works well, as you can pass the entire address to your routine.  the
registers only hold one byte, so you would have to use two registers to pass
the address.  Another thing that will work, is the following line of code


CallSub   JSR   SubRotne
          DW    Parmlist
          .
          .
          .


with this bit of code, you could then check the stack for the return address,
which would point at the address containing the address of your parameter
list.  Your subroutine would have to add 2 to the address contained in the
stack before issuing the RTS command, or you would return to the DW and your
program would do strange things, or it would die horribly.



I now this is quick and dirty, but I've got to run, If you've got more
questions, you can e-mail me.

----
ProLine:  psonnek@pro-mansion    Sysop Pro-mansion: 507/726-6181
Internet: psonnek@pro-mansion.cts.com  MCImail:     psonnek
UUCP:     crash!pro-mansion!psonnek  ARPA: crash!pro-mansion!psonnek@nosc.mil
BITNET:   psonnek%pro-mansion.cts.com@nosc.mil
               <<Real programmers don't program in HLL's.>>
               <<HLL's are for wimpy application coders!!>>

alfter@nevada.edu (SCOTT ALFTER) (04/26/91)

In article <8834@crash.cts.com> psonnek@pro-mansion.cts.com (Patrick Sonnek) writes:
>               <<Real programmers don't program in HLL's.>>
>               <<HLL's are for wimpy application coders!!>>
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Does anybody have a copy of the "Real Programmers (TM)" list?

(Yeah, yeah--it isn't really about the II, but I've been looking for
this, and I haven't followed rec.humor lately because I have too much
other crap going on.)

Scott Alfter-----------------------------_/_----------------------------
Call the Skunk Works BBS (702) 896-2676 / v \ 6 PM-6 AM 300/1200/2400
Internet: alfter@uns-helios.nevada.edu (    ( Apple II:
   GEnie: S.ALFTER                      \_^_/ the power to be your best!

fadden@cory.Berkeley.EDU (Andy McFadden) (04/27/91)

In article <51983@apple.Apple.COM> stadler@Apple.COM (Andy Stadler) writes:
>I disagree with your assessment that it's big and slow.  There are two tech-
>niques which make it quite efficient and useable.  The first is to use the
>Pascal style of never passing any items greater than 4 bytes long.  If a
>parameter is longer than 4 bytes, push a pointer to it.  This is an area where
>Pascal has an edge over C because C is always pushing and pulling huge streams
>of bytes on and off the stack (especially when working with strings).  Second,

Say WHAT?!?

No C program I have ever written (which includes NuLib and the usual
academic projects like a compiler, an object rendering system, etc) has
passed arguments larger than four bytes.

Strings are NOT passed as a string, but as a pointer to the string.  Many
versions of C don't allow entire structures to be passed (APW C is an
exception), but instead only allow pointers to them to be passed.

Whatever your C compiler allows, pushing and pulling huge streams of
stuff off the stack is incredibly inefficient and easy to avoid (heck,
it's easier to avoid it than to do it in the first place).  There are
VERY few cases where you would want to send an entire structure, and in
those cases Pascal wouldn't be able to do it at all, so I'd say C has
an edge over Pascal.

>Andy Stadler
>Apple Computer, Inc.

-- 
fadden@cory.berkeley.edu (Andy McFadden)
..!ucbvax!cory!fadden
fadden@hermes.berkeley.edu (when cory throws up)

rhyde@dimaggio.ucr.edu (randy hyde) (04/27/91)

>> Its quite simple (to access parameters on the stack) if you point the
   direct page at the stack...

True, it's easy, but you give up two important things about the direct page--
the ability to use it as a set of 256 scratch pad registers, and furthermore,
it's doubtful the dp would be page aligned costing you another cycle on each
dp memory access.

I've often wondered why compilers insist on using the dp as a frame pointer
rather than fixing dp (or pseudo-fixing it) and using register allocation
schemes like a RISC machine.  The resulting code would run quite a bit faster.

toddpw@nntp-server.caltech.edu (Todd P. Whitesel) (04/27/91)

rhyde@dimaggio.ucr.edu (randy hyde) writes:

>True, it's easy, but you give up two important things about the direct page--
>the ability to use it as a set of 256 scratch pad registers, and furthermore,
>it's doubtful the dp would be page aligned costing you another cycle on each
>dp memory access.

Randy, now I'm convinced you're thinking like a 6502 assembly programmer. It
is far more advantageous to have the DP move around so you can access the
automatic and argument variables with every addressing mode and instruction
available to the registers. If the dp were fixed, how would you handle local
variables? Save and restore the dp to a software stack? You are NOT going
to convince me to use the stack relative to get at the arguments -- you can
only use XX,s and (XX,s) with the basic Accumulator operations (LDA/STA/CMP/
AND/ORA/EOR/ADC/SBC). The code size involved in the shuffling of stuff
between the dp and stack would be a lot worse than the way it is now, and
I think you will find if you try it (like I have) that the execution time of
16 bit code whose data structures are not entirely in the direct page (my
main examples, an LZW decompressor and DiskCopy checksum calculator, both of
which I worked on within the last few months) easily dominates the cycles lost
to a non-aligned direct page. You literally will not notice the difference in
execution time -- unless the code is VERY intensive with direct page
variables. You are better off not bothering to align the dp unless the code
really needs an extra 10 or 15 percent when written in assembly!! I am not
denying that it ever happens (e.g. animation code, high performance math,
etc.) but for general purpose programming it does not matter that much.

>I've often wondered why compilers insist on using the dp as a frame pointer
>rather than fixing dp (or pseudo-fixing it) and using register allocation
>schemes like a RISC machine.  The resulting code would run quite a bit faster.

Not on a 65816 it wouldn't. There aren't enough registers and you can't get
at the stack as powerfully as you can the dp. This is the single biggest
deficiency of the 65816 when it comes to HLL's, IMHO.

I do agree with better register allocation, though -- Orca's use of the data
bank as a 'globals' bank pointer is idiotic and limits the size of globals
to 64K. Absolute long should be used for global variables, freeing up the
DBR to act in concert with either X or Y as a scratch pointer that can random
access a 64k area located anywhere. Put a tad of lvalue caching in the code
generator and the compiler should be able to generate excellent code for
ptr->struct type situations (which happens a lot when you are using GS/OS and
the toolbox):

	p->h = 5;	p->v = 47;	p->boingptr = NULL;

	pei	p+1
	plb
	plb
	ldx	p
	lda	#5
	sta	|h,x
; would be pei,plb,plb,ldx but the code generator notices DBR/X hasn't changed
	lda	#47
	sta	|v,x
; DBR/X also hasn't changed
	stz	|boingptr,x
	stz	|boingptr+2,x
; can't stz [],y , can we, Mike?

The above code is about as good as a decent assembly programmer could do
given that p is an arbitrary pointer and that *p is smaller than 64K, and
a good code generator for the 65816 could do the same. Orca's use of the
data bank register for faster access to globals has the same effect as
fixing the dp (which Orca doesn't do, thankfully) -- you do make the trivial
code generation examples faster, but the gain is more than offset by the
nontrivial address computations (pointers used as arrays, structs, arrays,
and combinations of the above) which reasult in really gross code compared
to what they could be if the code generator took better advantage of the
CPU architecture.

I believe the 65816 is ADEQUATE for HLL's but that it does not leave the
compiler many options. Fighting the available instruction set and CPU
architecture is not a good idea, and that what I see Orca's DBR use doing;
Randy's fixed dp idea would make a lot of sense if functions were generally
large and not called often, but I still think the benefits from an aligned
DP are fairly insignificant for HLL programming. I prefer a system that
handles arbitrarily complex conditions as well as the architecture allows,
since the whole reason I'm using C for GS-specific programming is to
avoid dealing with complex objects (like parameter block structs and arrays
of various objects) in assembly! The routines that get moved to assembly are
the time-critical ones can be coded in tight assembly, and they use a fairly
simple template I developed for emulating Orca/C's function stack frame model
(except mine preserves the caller's DBR automatically). I modified the template
to align the DP and literally did not notice the difference in execution speed
(I did clock the LZW decompressor: it was approximately 2% faster.)

Todd Whitesel
toddpw @ tybalt.caltech.edu

toddpw@nntp-server.caltech.edu (Todd P. Whitesel) (04/27/91)

Sorry to follow my own article, but I just realized we are misinterpreting
Randy's comments.

[I wrote this]
>Randy, now I'm convinced you're thinking like a 6502 assembly programmer.

This is incorrect, but I realized it as I was skimming the article when it
showed up in nn. I didn't try to cancel the post because I don't want to
retype the gunk about Orca and the DBR trick.

What Randy is thinking like is a RISC machine programmer, more specifically
to emulate a zillion-register RISC machine on the 65816 using an aligned
direct page. What I and I suspect many other on the group thought he was
talking about was using the direct page for globals and scratch, and the
stack relative for automatic variables -- I should have realized it because
I said it myself earlier today in another thread, it's such a ludicrous
idea that it couldn't have been the intended meaning, but when people don't
see the reasonable interpretation right away they too readily assume the
worst...

The idea of using the DP as a RISC machine emulator has LOTS of merit.
For one thing, it would be the only reasonable way to teach GCC the
65816, although I don't think any of us is up to messing with a behemoth
like GCC.

The main reservations I have about the idea are that function enter and
exit could get really nasty, and that there is a hideous problem with
using a fixed direct page for autos -- you can't safely pass pointers
to them because they might be saved/restored by the called function.
I suppose you could have all autos 'officially' on the stack, but that would
be really gross and the floating dp would still be faster except for floating
point calculations -- but are usually handed to SANE, and get done on an
aligned dp.

Todd Whitesel
toddpw @ tybalt.caltech.edu

stadler@Apple.COM (Andy Stadler) (04/27/91)

In article <13954@ucrmath.ucr.edu> rhyde@dimaggio.ucr.edu (randy hyde) writes:
>>> Its quite simple (to access parameters on the stack) if you point the
>   direct page at the stack...
>
>True, it's easy, but you give up two important things about the direct page--
>the ability to use it as a set of 256 scratch pad registers, and furthermore,
>it's doubtful the dp would be page aligned costing you another cycle on each
>dp memory access.
>
>I've often wondered why compilers insist on using the dp as a frame pointer
>rather than fixing dp (or pseudo-fixing it) and using register allocation
>schemes like a RISC machine.  The resulting code would run quite a bit faster.

On this point I must respectfully disagree.  First, you -can- get "scratch"
registers.  In the world of high level languages they are called "local
variables."  Rare is the subroutine which actually uses 256 bytes of param-
eters- so you just subtract a bit from the stack on your procedure entry, and
use the direct page for parameters AND local variables.  Here's a picture:

	|-----------------|
	| function result |
	|-----------------|
	|      ...	  |
	|   parameters    |
	|      ...	  |
	|-----------------|
	|      RTL	  |
	|-----------------|
	|      PHD	  |
	|-----------------|
	|      ...	  |
	|   local vars	  |
	|      ...	  |
	|-----------------|
   DP ->|		  |<- SP

Even with 50 bytes of parameters you'd still have over 200 bytes of scratch
pad storage - and it's all useful as pointers because it's on the direct page.
And, because it's on the stack, it's automatically protected when you call 
other routines, and it's by definition recursive and reentrant.

On your second point, that it's inefficient because of the non-aligned DP, I
will grant you that this it true;  on the other hand, if you have a loop which
is so tight that the one cycle loss is too much, then you shouldn't be in high
level!  But let's be honest here!  How many loops in a large program actually
require that much tightness?  Very few!

Andy Stadler
Apple Computer, Inc.

rhyde@feller.ucr.edu (randy hyde) (04/27/91)

Passing parameters on the (65816) stack is elegant, but it's still slow.
Most routines only access the parameters a few times (do a dynamic analysis
of your code sometime).  Besides the set up involved (saving and loading
DP), you also have to take into account pushing the parameters on the stack
in the first place.

Pushing parameters on the stack is great if
1) you're interfacing to code which requires this (e.g., toolbox or HLLs),
2) you need reentrancy,
3) you need simplicity and consistency.

Simplicity?  Isn't using the stack complicated?  Maybe the first time around
for beginners, but after you get used to it it becomes automatic (even without
macros.)

As I pointed out in a previous post, however, the cost is high.  You
have to give
up DP (a great place to pass parms in global locations)  and there is a lot of
set up involved.  Imagine someone writing a procedure called in 8-bit mode to
convert a character to upper case (not that I would suggest this, in-line code
would be more appropriate):

Char passed in A:

	ToUpper	cmp	#'a'
		blt	>0
		cmp	#'z'+1
		bge	>0
		and	#$5f
	^0	ret

Char passed on stack, value returned on stack, no bp setup:

	ToUpper	pha
		lda	3,s
		cmp	#'a'
		blt	>0
		cmp	#'z'+1
		bge	>0
		and	#$5f
		sta	3,s
	^0	pla
		ret

Now what happens when you want to pass it on the stack?  Even assuming the
processor is in 16-bit mode (so you can easily transfer S->A->D there is
a lot more work involved.

Of course, if the routine is relatively complex and eats up a lot of time,
who cares how long it takes to set up and access the parameters?  But for
short routines it can be a significant penalty.  Indeed, one of the main
reasons
C compilers do so well on SparcStations and other RISC machines is
because they pass their parameters in registers rather than on the stack
(on the SPARC, the
register back *is* a stack, local to the chip, but that's another story).

I did not suggest that people *not* use the stack to pass parameters. 
There are
a lot of advantages to it.  However, it wouldn't be my first choice for time
critical, short, routines.  As long as HLLs pass their parameters in
this fashion, I can always write code that will outperform said stuff,
by a wide margin, in assembly.

ericmcg@pnet91.cts.com (Eric Mcgillicuddy) (04/27/91)

Most of what I pass fits in a register. Bad habits from my '02 days, I could
never figure where the parms were on the stack. For moderate numbers of parms,
I use the stack, the '816 stack ops are wunderbar. For large parms, structs,
arrays (redundant since I treat arrays as structs) I just keep the offset in
the X register and access into the data bank through the B register. 

UUCP: bkj386!pnet91!ericmcg
INET: ericmcg@pnet91.cts.com

rhyde@feller.ucr.edu (randy hyde) (04/27/91)

Concerning that extra cycle on non-page align dp...

No offense Andy, but if I wanted to write in C, I'd write in C.
A one cycle penalty on a three cycle instruction is 25%.  You should see
how hard compilers work to get a 25% performance improvement.  If
everyone passed all their parameters on the stack and moved DP around as
you suggest to get local variables, they may as well be writing in C; 
well, perhaps not on the 65816, but if you're going to limit yourself to
writing code like the C or Pascal compilers do, there is very little
benefit to using assembly.  You certainly won't get the 5-10x
performance boosts I've been talking about.

Everyone has been jumping on me about telling people not to use the
stack as a parameter passing mechanism.  I did no such thing!  I simply
said it's bulkier and slower than passing in registers, in DP, etc... 
It IS!  I guess people have gotten so used to toolbox calls that they've
forgotten other ways to pass parameters!  If you can, passing your
parameters in A, X, and Y is always more convienient (and faster!) than
on the stack.  If you organize your program properly, using the
variables in dp is better.  If you must use a stack architecture, why
not set up two stacks?  Use the hardware stack for return addresses,
saving registers, etc., and set up a second stack (using DP as the
stack pointer) in another area of memory.  This is much easier than
messing with the stack itself (although you don't get the benefit of PH?
to set up parms).  A forth system on the GS used this scheme.

Ultimately, *your* choice of parameter passing mechanism depends on the
application at hand.  The stack is the appropriate vehicle for many
procedures and functions.  However, it's not a good idea to use the
stack as default without thinking the problem through first.  There may
be a better way.

toddpw@nntp-server.caltech.edu (Todd P. Whitesel) (04/27/91)

rhyde@feller.ucr.edu (randy hyde) writes:

>A one cycle penalty on a three cycle instruction is 25%.  You should see
>how hard compilers work to get a 25% performance improvement.
>...
>writing code like the C or Pascal compilers do, there is very little
>benefit to using assembly.  You certainly won't get the 5-10x
>performance boosts I've been talking about.

With Orca/C, you can get about 2x if the code is simple, 4-8x if it uses
lots of arrays or structs. I found that I got 10x improvement on my LZW
decompressor by writing the code in 'smart' assembly, by translating each
group of lines and optimizing the register usage between them, and writing
the critical construction loop very tightly. I got about 2% on top of that
by aligning the direct page, and it took me a while to figure out how to
modify the enter/exit code to properly snap the direct page as well as
make temporary copies of the function arguments in case they were not in
reach of the aligned DP. I think the code is an appropriate example --
it uses lots of 16 bit DP variables -- and I have to conclude that aligning
the direct page wasn't worth the effort in this case.

I am not trying to say that Randy is wrong -- this is the only example I
am really familiar with. It's just that can't think of any _application_
code examples for which the DP alignment would make a significant difference
compared to everything else.

Todd Whitesel
toddpw @ tybalt.caltech.edu

stadler@Apple.COM (Andy Stadler) (04/27/91)

In article <13084@pasteur.Berkeley.EDU> fadden@cory.Berkeley.EDU writes:
>
>No C program I have ever written (which includes NuLib and the usual
>academic projects like a compiler, an object rendering system, etc) has
>passed arguments larger than four bytes.
>
>Strings are NOT passed as a string, but as a pointer to the string.  Many
>versions of C don't allow entire structures to be passed (APW C is an
>exception), but instead only allow pointers to them to be passed.

1.  No C program -you- have ever written moved structures - the only thing
    that implies is that you understand how compilers work!  There are a lot
    of people out there who don't;  Many of them read this newsgroup and just
    because you know enough to design parameter calls efficiently doesn't mean
    that everybody else does.

2.  Consider your exception - it happens to be the #1 C compiler used on the
    GS.  I would consider that to be the norm.  Large structures on the stack
    is a sad truth which we must consider on the GS.

Andy Stadler
Apple Computer, Inc.

gtephx (Brian Campbell) (04/30/91)

In article <13845@ucrmath.ucr.edu>, rhyde@musial.ucr.edu (randy hyde) writes:
> 4) Pass parameters on the stack.  This is how most HLLs pass parameters.
> It is slow, big, and accessing the parameters is inconvenient,
> especially on processors like the 65816 which don't have a frame pointer
> register.  Nonetheless, there are some advantages to this technique: 

I thought that the 65816 had a stack addressing mode that allows one to
"random" access values on the stack (which is not present
on the 6502).  It looks like this: (i,S) which I take to mean, access the
word at offset i relative to the Stack pointer.  So an immediate value of 0,
means access the last word pushed, 2 means access the previous word pushed,
etc.  I've not had any experience with this mode -- but am I right or wrong?
(Of course, it would access a byte with the m flag = 1).
BTW, if all of this is true, in a assembly routine, to by-pass the JSL
return address, would you typically use offsets of 3,5,7... to get word
parameters n,n-1,n-2,... where n is the number of data words pushed?

meekins@anaconda.cis.ohio-state.edu (Tim Meekins) (04/30/91)

In article <1991Apr29.203359.250@...!asuvax!gtephx> campbellb@...!asuvax!gtephx (Brian Campbell) writes:
>I thought that the 65816 had a stack addressing mode that allows one to
>"random" access values on the stack (which is not present
>on the 6502).  It looks like this: (i,S) which I take to mean, access the
>word at offset i relative to the Stack pointer.  So an immediate value of 0,
>means access the last word pushed, 2 means access the previous word pushed,
>etc.  I've not had any experience with this mode -- but am I right or wrong?
>(Of course, it would access a byte with the m flag = 1).
>BTW, if all of this is true, in a assembly routine, to by-pass the JSL
>return address, would you typically use offsets of 3,5,7... to get word
>parameters n,n-1,n-2,... where n is the number of data words pushed?

You've almost got it. The stack pointer points to where the next byte will
go, NOT where the last byte is. To access the last byte pushed, use 1,s,
NOT 0,s. Other than that, I believe you're on track. Also, if you map
the DP to the stack (TCS,TCD), loading from Direct Page location $01 will
access the last byte pushed.
--
+---------------------------S-U-P-P-O-R-T-----------------------------------+
|/ Tim Meekins                  <<>> Snail Mail:           <<>>  Apple II  \|
|>   meekins@cis.ohio-state.edu <<>>   8372 Morris Rd.     <<>>  Forever!  <|
|\   timm@pro-tcc.cts.com       <<>>   Hilliard, OH 43026  <<>>            /|

alfter@nevada.edu (SCOTT ALFTER) (05/01/91)

In article <1991Apr29.203359.250@...!asuvax!gtephx> campbellb@...!asuvax!gtephx (Brian Campbell) writes:
>I thought that the 65816 had a stack addressing mode that allows one to
>"random" access values on the stack (which is not present
>on the 6502).  It looks like this: (i,S) which I take to mean, access the
>word at offset i relative to the Stack pointer.

Not present on the 6502?  You could always do a TSX followed by a
LDA $100,X to do that.  (Want even more random access?  The stack is
in a fixed location; just grab a byte anywhere from $100 to $1FF.)

Scott Alfter-----------------------------_/_----------------------------
Call the Skunk Works BBS (702) 896-2676 / v \ 6 PM-6 AM 300/1200/2400
Internet: alfter@uns-helios.nevada.edu (    ( Apple II:
   GEnie: S.ALFTER                      \_^_/ the power to be your best!

gtephx (Brian Campbell) (05/02/91)

In article <114343@tut.cis.ohio-state.edu>, meekins@anaconda.cis.ohio-state.edu (Tim Meekins) writes:
> In article <1991Apr29.203359.250@...!asuvax!gtephx> campbellb@...!asuvax!gtephx (Brian Campbell) writes:
> >BTW, if all of this is true, in a assembly routine, to by-pass the JSL
> >return address, would you typically use offsets of 3,5,7... to get word
> >parameters n,n-1,n-2,... where n is the number of data words pushed?
> 
> You've almost got it. The stack pointer points to where the next byte will
> go, NOT where the last byte is. To access the last byte pushed, use 1,s,
> NOT 0,s. Other than that, I believe you're on track. Also, if you map
> the DP to the stack (TCS,TCD), loading from Direct Page location $01 will
> access the last byte pushed.

Therefore, (1,s), (2,s), and (3,s) point to the RTL address, and offsets of 4,6,8,...
will point to parameters n,n-1,n-2,...  A handy way to access parameters, if they
are simple word values.  If they are pointers, then you would have to move them
into a direct page to use (zaddr) or [zaddr] addressing modes to get to the data.
The DP method can be more convenient (unless you also need to frequently access a DP
page as "registers").  BTW, don't you mean TSC?

gtephx (Brian Campbell) (05/02/91)

In article <1991Apr30.170428.17156@nevada.edu>, alfter@nevada.edu (SCOTT ALFTER) writes:
> In article <1991Apr29.203359.250@...!asuvax!gtephx> campbellb@...!asuvax!gtephx (Brian Campbell) writes:
> >on the 6502).  It looks like this: (i,S) which I take to mean, access the
> >word at offset i relative to the Stack pointer.
> Not present on the 6502?  You could always do a TSX followed by a
> LDA $100,X to do that.  (Want even more random access?  The stack is
> in a fixed location; just grab a byte anywhere from $100 to $1FF.)

True, but what I meant was an addressing mode supported in single instruction,
and *relative* to the stack pointer.  The TSX, LDA $100,X is a neat trick though,
although it takes 4 bytes vs. 2 for an LDA (i,S) in the 65816.  Grabbing a byte
via LDA $1xx is rarely useful, because you generally need to access relative
(the R word again) to the Stack pointer.