unpowell@csvax.liv.ac.uk (03/18/88)
I have recently been following the news items by Wayne
Knapp, which are about the writing of a routine (that will be
made public domain) to allow 512 color pictures to be displayed.
He has been requesting ideas form people about the fastest way to
update the pallette registers. There have been a couple of
"followups" to this, a couple coming from himself (I liked
Wayne's routine for getting in sync with the display, quite
neat). From what I have gathered, it seems that there will be a
long list of words in memory which will be shoved into the
pallette registers as fast as possible.
It has been suggested using the move multiple register
instruction. Quite true, this is the fastest way of moving memory
about on the ST. eg
lea $8240.w,a0
lea colorlist,a1
movem.l (a1)+,d0-d7 ;repeat these two instructions
movem.l d0-d7,(a0) ;as many times as possible
...
This will move a block of 16 words from memory into the 16
pallette registers. Although this is the fastest way to update
the pallette registers, it has a problem. There is a delay while
the registers are being reloaded. This will create problems for a
program that has to update the "colorlist" (as I shall call it),
because after all this colorlist must at some time be created by
some program. It would make it much simpler to alter this
colorlist if the pallette registers were being updated at a
regular pace. But then again he may be looking for the fastest
way and not the simplest way.
Now that I have attracted your attention, with the above
(interesting?) item, I would like to mention something else (not
quite as interesing?), the absolute short addressing mode. This
fantastic little addressing mode seems to be quite neglected by
nearly every ST programmer I have known. I don't know why, even
the TOS programmers didn't use it. Instead of having a 32 bit
field to specify the address, (as in absolute long), absolute
short uses a sign extended 16 bit field, which means that
instructions using this addressing mode are shorter and, of
course, faster! This allows access to the last 32K of memory (the
input/output locations) and the first 32K of memory (the
exception vectors etc.) at increased speed. Your wish to use
either absolute long or absolute short is communicated to the
assembler by terminating the address with a ".L" or ".W",
respectively. Absolute long is usually the default, so if you
want absolute short you must specify it, by adding the ".W". Have
a look at the following instructions, their patterns in
memory and their execution times:-
move.l $40,d0 2039 0000 0040 20 clock periods
move.b $ffff8260,d0 1039 ffff 8260 16 clock periods
You can see that the second word in both of the patterns is
really redundant. If the above two instructions are converted to
absolute short...
move.l $40.w,d0 2038 0040 16 clock periods
move.b $8260.w,d0 1038 8260 12 clock periods
A word is saved in each instruction, as are four clock
periods. So you can see, when you are accessing the first or last
32K of memory it is much more efficiemt to use absolute short.
NB When using symbols with absolute short, you must usually place
the symbol name in brackets eg
screen_resolution equ $8260
move.b (screen_resolution).w,d0 1038 8260
Another few time and space saving instructions that I
regularly use, and lots of other people seem to be ignorant of
are:-
1. When clearing all 32 bits of a data register use
moveq #0,d0 4 clock periods and not
clr.l d0 6 clock periods
2. The fastest way to clear all 32 bits of an address register
is
sub.l a0,a0 8 clock periods
Although it is possible to clear an address register with
lea 0.w,a0 8 clock periods
in just the same time, the former takes up only 1 word in
memory, while the later takes 2.
3. When adding to an address register, use
lea <16 bit displacement>(a0),a0 8 clock periods
ie
lea 29000(a0),a0
lea -346(a0),a0
whenever possible. Although this method only allows a 16 bit
number to be added to the address register (values in the
range -32768 to +32767), it is considerably faster than the
equivalent
add.l #29000,a0 16 clock periods
sub.l #346,a0 16 clock periods
Remember that the 16 bit value added is sign extended to 32
bits before the addition is performed, and so the upper word
of the address register will be updated (this is why I
compared it to "add.l" and "sub.l" and not "add.w"), eg
a0 before | Instruction | length (words) | a0 after
-----------+-----------------------+----------------+----------
$6fff0 | lea $200(a0),a0 | 2 | $701f0
$6fff0 | add.w #$200,a0 | 2 | $601f0
$6fff0 | add.l #$200,a0 | 3 | $701f0
As you can see the load effective address is much more
versatile than the add.
I hope the above information proves useful, to you. If you
have any comments or find any inaccuracies in the above I'd be
glad to hear from you.
Mark Powell
********************************************************************************
"...there's no success JANET unpowell@uk.ac.lis.csvax
like failure and UUCP {backbone}!mcvax!ukc!mupsy!lis-cs!unpowell
failure's no success ARPA unpowell%csvax.lis.ac.uk@nss.cs.ucl.ac.uk
at all..." B.Dylan
********************************************************************************