[comp.sys.atari.st] More about extra coloursUP

unpowell@csvax.liv.ac.uk (03/18/88)
     I  have  recently  been following the news  items  by  Wayne 
Knapp,  which  are about the writing of a routine (that  will  be 
made public domain) to allow 512 color pictures to be  displayed. 
He has been requesting ideas form people about the fastest way to 
update  the  pallette  registers.  There have been  a  couple  of 
"followups"  to  this,  a  couple coming from  himself  (I  liked 
Wayne's  routine  for getting in sync  with  the  display,  quite 
neat).  From what I have gathered,  it seems that there will be a 
long  list  of  words in memory which will  be  shoved  into  the 
pallette registers as fast as possible.
     It  has  been  suggested using the  move  multiple  register 
instruction. Quite true, this is the fastest way of moving memory 
about on the ST. eg

     lea     $8240.w,a0
     lea     colorlist,a1
     movem.l (a1)+,d0-d7      ;repeat these two instructions
     movem.l d0-d7,(a0)       ;as many times as possible
     ...

This  will  move  a block of 16 words from  memory  into  the  16 
pallette  registers.  Although this is the fastest way to  update 
the pallette registers,  it has a problem. There is a delay while 
the registers are being reloaded. This will create problems for a 
program that has to update the "colorlist" (as I shall call  it), 
because after all this colorlist must at some time be created  by 
some  program.  It  would  make it much  simpler  to  alter  this 
colorlist  if  the  pallette registers were being  updated  at  a 
regular  pace.  But then again he may be looking for the  fastest 
way and not the simplest way.
     Now  that I have attracted your attention,  with  the  above 
(interesting?) item,  I would like to mention something else (not 
quite as interesing?),  the absolute short addressing mode.  This 
fantastic  little addressing mode seems to be quite neglected  by 
nearly every ST programmer I have known.  I don't know why,  even 
the  TOS programmers didn't use it.  Instead of having a  32  bit 
field  to specify the address,  (as in absolute  long),  absolute 
short  uses  a  sign extended 16  bit  field,  which  means  that 
instructions  using  this addressing mode  are  shorter  and,  of 
course, faster! This allows access to the last 32K of memory (the 
input/output  locations)  and  the  first  32K  of  memory   (the 
exception  vectors  etc.) at increased speed.  Your wish  to  use 
either  absolute  long or absolute short is communicated  to  the 
assembler  by  terminating  the  address with  a  ".L"  or  ".W", 
respectively.  Absolute  long is usually the default,  so if  you 
want absolute short you must specify it, by adding the ".W". Have 
a  look  at  the following instructions, their  patterns  in 
memory and their execution times:-

     move.l  $40,d0           2039 0000 0040   20 clock periods     
     move.b  $ffff8260,d0     1039 ffff 8260   16 clock periods

     You can see that the second word in both of the patterns  is 
really redundant.  If the above two instructions are converted to 
absolute short...

     move.l  $40.w,d0         2038 0040        16 clock periods
     move.b  $8260.w,d0       1038 8260        12 clock periods

     A  word  is saved in each instruction,  as  are  four  clock 
periods. So you can see, when you are accessing the first or last 
32K of memory it is much more efficiemt to use absolute short.
NB When using symbols with absolute short, you must usually place 
the symbol name in brackets eg

     screen_resolution   equ  $8260

     move.b  (screen_resolution).w,d0   1038 8260

     Another  few  time  and space  saving  instructions  that  I 
regularly  use,  and lots of other people seem to be ignorant  of 
are:-

1.   When clearing all 32 bits of a data register use

     moveq   #0,d0       4 clock periods     and not
     clr.l   d0          6 clock periods

2.   The fastest way to clear all 32 bits of an address  register 
     is 

     sub.l   a0,a0       8 clock periods

     Although it is possible to clear an address register with

     lea     0.w,a0      8 clock periods

     in  just the same time,  the former takes up only 1 word  in 
     memory, while the later takes 2.

3.   When adding to an address register, use

     lea     <16 bit displacement>(a0),a0    8 clock periods
     ie
     lea     29000(a0),a0
     lea     -346(a0),a0

     whenever possible. Although this method only allows a 16 bit 
     number  to be added to the address register (values in   the 
     range -32768 to +32767),  it is considerably faster than the 
     equivalent

     add.l   #29000,a0   16 clock periods
     sub.l   #346,a0     16 clock periods

     Remember that the 16 bit value added is sign extended to  32 
     bits before the addition is performed, and so the upper word 
     of  the  address  register will be updated (this  is  why  I 
     compared it to "add.l" and "sub.l" and not "add.w"), eg

   a0 before |    Instruction        | length (words) |  a0 after
  -----------+-----------------------+----------------+----------
    $6fff0   |  lea     $200(a0),a0  |       2        |   $701f0
    $6fff0   |  add.w   #$200,a0     |       2        |   $601f0
    $6fff0   |  add.l   #$200,a0     |       3        |   $701f0

     As  you  can  see the load effective address  is  much  more 
     versatile than the add.

     I hope the above information proves useful,  to you.  If you 
have  any comments or find any inaccuracies in the above  I'd  be 
glad to hear from you.

                     Mark Powell

********************************************************************************

 "...there's no success   JANET unpowell@uk.ac.lis.csvax
  like failure and        UUCP  {backbone}!mcvax!ukc!mupsy!lis-cs!unpowell
  failure's no success    ARPA  unpowell%csvax.lis.ac.uk@nss.cs.ucl.ac.uk
  at all..." B.Dylan

********************************************************************************