[comp.sys.amiga.tech] A68k and Manx AS

brett@umd5.umd.edu (Brett Bourbin) (04/29/89)

	After putting up with Manx's assembler for over a year now, I 
decided to try out the PD A68k assembler.  Besides having to add END
statements to all my modules, everything compiled without problem. 

	There was one difference though, the object code size for the
A68k code was 30% _Smaller_ than that of Manx's AS.  Is there something
that AS includes in it's .o files that A68k is not?  This coupled with
the fact I can use BLink instead of LN has made me switch my assembly
enviorment.

	Just wanted to pass on some information I found out.  8^)

a218@mindlink.UUCP (Charlie Gibbs) (05/01/89)

In article <4784@umd5.umd.edu> brett@umd5.umd.edu (Brett Bourbin) writes:
 <...>
>        There was one difference though, the object code size for the
>A68k code was 30% _Smaller_ than that of Manx's AS.  Is there something
>that AS includes in it's .o files that A68k is not?  This coupled with
>the fact I can use BLink instead of LN has made me switch my assembly
>enviorment.

     I've put the following optimization routines into A68k:

        - Bcc is converted to Bcc.S if possible (for backward
            references only - forward references are too hard :-).

        - Backward references within the current CODE section
            are converted to PC-relative if possible (again,
            forward references get ugly).

        - ADD, SUB, and MOVE instructions are converted to
            ADDQ, SUBQ, and MOVEQ if possible.

        - Any operand of the form 0(An) is converted to (An)
            (except for MOVEP, which must have a displacement).

     Some people have complained that this mucks up jump tables,
so my next version will likely have a switch allowing optimization
to be disabled (or I'll add support for operands such as (xxxx).L,
which A68k currently can't handle).  I'm impressed that it shrunk
your object code by 30%.  Thanks for the testimonial.

Charlie_Gibbs@mindlink.UUCP
"Programmers who write small modules have short attention spans."

jesup@cbmvax.UUCP (Randell Jesup) (05/04/89)

In article <221@mindlink.UUCP> a218@mindlink.UUCP (Charlie Gibbs) writes:
>        - Bcc is converted to Bcc.S if possible (for backward
>            references only - forward references are too hard :-).
>
>        - Backward references within the current CODE section
>            are converted to PC-relative if possible (again,
>            forward references get ugly).

	Do you convert BSR to BSR.S?  JSR to BSR(.S) might be useful when
you plan to link with SMALLCODE, also.

	ADD.L	#<16 bit number>,An should be converted to ADD.W.  Ditto
for SUB, etc.

	Forward optimization is REAL nice.  I HATE to have to sprinkle .S's
into my code, then remove the ones that fail!

-- 
Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup

a218@mindlink.UUCP (Charlie Gibbs) (05/04/89)

In article <6755@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes:

>        Do you convert BSR to BSR.S?  JSR to BSR(.S) might be useful when
>you plan to link with SMALLCODE, also.

     A68k converts suitable backward references for BSR to BSR.S, as
well as Bcc and BRA (it's too stupid to tell them apart :-).  I agree
about the JSR->BSR conversion.  I haven't done anything about SMALLCODE
yet because I CAN'T FIND ANY DOCUMENTATION!  If anyone knows anything
about SMALLCODE, SMALLDATA, hunk_drelocXX, etc. could you PLEASE either
give me the information or tell me where to find it!

>        ADD.L   #<16 bit number>,An should be converted to ADD.W. Ditto
>for SUB, etc.

     Yup.  I'll file that one under the "wild optimizations" heading
in my "to do" file.  Maybe I could go all the way to ADDQ, etc.

>        Forward optimization is REAL nice....

     It's also REAL ugly.  But Bruce Dawson has given me suggestions
as to how I might do it for Bcc/BRA/BSR without completely rewriting
everything or taking another pass through the source code, so I may
have something Real Soon Now.  Other forward optimizations will have
to wait until I either get another viable suggestion or an inspiration
from out of the blue.

>Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup

Charlie_Gibbs@mindlink.UUCP
"The terms 'undefined' and 'unspecified' have been given definitions."
        -- revision to the IEEE 1003.1 POSIX Standard

doug@xdos.UUCP (Doug Merritt) (05/05/89)

In article <221@mindlink.UUCP> a218@mindlink.UUCP (Charlie Gibbs) writes:
>        - Bcc is converted to Bcc.S if possible (for backward
>            references only - forward references are too hard :-).

There were some assemblers on the PDP 11 (long ago, V6 Unix) that did
the forward references, too. Problematic due to an apparent necessity for
an indefinite number of passes or the equivalent, but when your text memory
space is only 64K I guess it may seem worth it.
	Doug
-- 
Doug Merritt			Member, Crusaders for a Better Tomorrow
{pyramid,apple}!xdos!doug	Professional Wildeyed Visionary

"Of course, I'm no rocket scientist" -- Randell Jesup, Capt. Boinger Corps

rokicki@polya.Stanford.EDU (Tomas G. Rokicki) (05/06/89)

>>        - Bcc is converted to Bcc.S if possible (for backward
>>            references only - forward references are too hard :-).

>There were some assemblers on the PDP 11 (long ago, V6 Unix) that did
>the forward references, too. Problematic due to an apparent necessity for
>an indefinite number of passes or the equivalent, but when your text memory
>space is only 64K I guess it may seem worth it.

If you have memory available, it's not *that* difficult.  Simply maintain
a list of `candidate' branches during the first pass.  A branch is a
candidate if --> it's not already short and there isn't too much unmoveable
code between it and it's target.  Now, simply run multiple passes over
your branch list in *memory*, changing any appropriate ones to short, until
it can no longer be done.  Now go back and update your symbol table.

tom

a218@mindlink.UUCP (Charlie Gibbs) (05/08/89)

In article <8979@polya.Stanford.EDU>
rokicki@polya.Stanford.EDU (Tomas G. Rokicki) writes:

>If you have memory available, it's not *that* difficult.  Simply maintain
>a list of `candidate' branches during the first pass.  A branch is a
>candidate if --> it's not already short and there isn't too much unmoveable
>code between it and it's target.  Now, simply run multiple passes over
>your branch list in *memory*, changing any appropriate ones to short, until
>it can no longer be done.  Now go back and update your symbol table.

     This sounds similar to a suggestion given to me by Bruce Dawson of
CygnusEd Pro fame.  Actually, it wouldn't even need that much memory;
since short branches can only span 128 bytes, I could kick candidates
off the end of the table once I've gone more than that far without
determining that I can convert it to short.  The largest I'd have to
make the table (assuming a whole bunch of consecutive forward Bcc
instructions) is 32 entries, since candidates would each take 4 bytes.
I'd probably want another table of pointers to symbol table entries
that would have to be adjusted, though.  That one could need up to 64
entries, assuming a bunch of consecutive labeled 2-byte instructions.
Still small potatoes, memory-wise.

     Of course, if I hit a SECTION directive I'd have to throw the
table away, so cases like

                SECTION myprog,CODE
                BRA     foo
                SECTION myprog,DATA
                DC.B    'gratuitous data section'
                SECTION myprog,CODE
        foo:

wouldn't get optimized, but I can't see this happening too often.

     The one fly in the ointment is CNOP.  Here's a case that would
break things:

                CNOP    0,4
                BRA     foo
                CNOP    0,4
        foo:

In this case, changing the BRA to BRA.S would not cause foo to move
because of the second CNOP.  So I can't just adjust symbol table
entries.  What I could do, though, is to jump back in the source
file and reprocess things starting at the Bcc that has to be adjusted.
But then there's the problem of going into and out of INCLUDEs,
macros, etc.   Maybe I should just wipe the table if I encounter a
CNOP, like I'll have to do with SECTION.

     Thanks for helping me think things through.  Just writing this
message has helped put things straight in my mind.

>tom

Charlie_Gibbs@mindlink.UUCP
"The terms 'undefined' and 'unspecified' have been given definitions."
        -- revision to the IEEE 1003.1 POSIX Standard

fnf@estinc.UUCP (Fred Fish) (05/14/89)

In article <8979@polya.Stanford.EDU> rokicki@polya.Stanford.EDU (Tomas G. Rokicki) writes:
>
>>>        - Bcc is converted to Bcc.S if possible (for backward
>>>            references only - forward references are too hard :-).
>
>>There were some assemblers on the PDP 11 (long ago, V6 Unix) that did
>>the forward references, too. Problematic due to an apparent necessity for
>>an indefinite number of passes or the equivalent, but when your text memory
>>space is only 64K I guess it may seem worth it.
>
>If you have memory available, it's not *that* difficult.  Simply maintain
>a list of `candidate' branches during the first pass.  A branch is a
>candidate if --> it's not already short and there isn't too much unmoveable
>code between it and it's target.  Now, simply run multiple passes over
>your branch list in *memory*, changing any appropriate ones to short, until
>it can no longer be done.  Now go back and update your symbol table.

Based on my experiences writing an experimental linker at Motorola for the
88K, which was capable of doing some object code optimizations at link time,
involving inserting or deleting instructions, I would recommend initially
assuming that ALL branches are short branches, and then changing the ones
that don't meet the criteria for short branches.  The main reason being
that if you are only expanding the code, once a branch target moves out
of range of a short branch, it will never move back in range, so you can
examine each branch independently of all the others.  If instead you are
trying to shrink the code by converting long branches to shorter branches,
there may be situations where examining each branch would appear to
result in no more conversions (terminating the loop), while simultaneously
shortening several branches would allow their targets to move into range.
In practice, two passes over the code is usually sufficient to find all
cases.  In rare cases, it would take three passes to find all cases.
I can't recall ever seeing more than three passes.

-Fred
-- 
# Fred Fish, 1835 E. Belmont Drive, Tempe, AZ 85284,  USA
# 1-602-491-0048           asuvax!{nud,mcdphx}!estinc!fnf

dbk@teroach.UUCP (Dave Kinzer) (05/14/89)

In article <87@estinc.UUCP> fnf@estinc.UUCP (Fred Fish) writes:
>In article <8979@polya.Stanford.EDU> rokicki@polya.Stanford.EDU (Tomas G. Rokicki) writes:
>>
   The subject is optimizing 68000 branch instructions for forward branches.

   Tom explains a possible technique.

   Fred suggests assuming short branches then converting to long as needed
since once they are converted to long they will not need to be checked
again.

   I'm afraid that this falls into the pick your poison catagory.  If you
assume all branches to be long, once you start going back through the
code to shorten the branches, once they have been shortened they need
not be looked at again.  They will never fall back into the domain of
long branches (i.e. we only shrinking the code.)  It's the same thing,
only backwards.

   Assuming long first has at least one advantage:  You need not run
the optimizer if you don't want to take the time (probably minimal, but
at least you have a choice.) 

|    // GOATS - Gladly Offering All Their Support  Dave Kinzer (602)897-3085|
|   //  >> In Hell you need 4Mb to Multitask!  <<  uunet!mcdphx!teroach!dbk |
| \X/   #define policy_maker(name) (name->salary > 3 * dave.salary)         |

dbk@teroach.UUCP (Dave Kinzer) (05/22/89)

In article <10891@mcdphx.phx.mcd.mot.com> (I) incorrectly write:
>   The subject is optimizing 68000 branch instructions for forward branches.
>
>   Fred suggests assuming short branches then converting to long as needed
>since once they are converted to long they will not need to be checked
>again.

  I go on to suggest that assuming long works out as well.  Fred Fish sent
mail explaining that there may exist cases (multiple branches near the
end of short branch range) that cannot be shortened considered individually,
but can be shortened if considered as a group.  Therefore, it is easier
to assume them all short, and lengthen only those that need it.  I
stand corrected.

   More evidence that no matter how well meaning, Usenet advice may be 
worth exactly what you pay for it.  ;-)

|    // GOATS - Gladly Offering All Their Support  Dave Kinzer (602)897-3085|
|   //  >> In Hell you need 4Mb to Multitask!  <<  uunet!mcdphx!teroach!dbk |
| \X/   #define policy_maker(name) (name->salary > 3 * dave.salary)         |