unpowell@csvax.liv.ac.uk (03/18/88)
I have recently been following the news items by Wayne Knapp, which are about the writing of a routine (that will be made public domain) to allow 512 color pictures to be displayed. He has been requesting ideas form people about the fastest way to update the pallette registers. There have been a couple of "followups" to this, a couple coming from himself (I liked Wayne's routine for getting in sync with the display, quite neat). From what I have gathered, it seems that there will be a long list of words in memory which will be shoved into the pallette registers as fast as possible. It has been suggested using the move multiple register instruction. Quite true, this is the fastest way of moving memory about on the ST. eg lea $8240.w,a0 lea colorlist,a1 movem.l (a1)+,d0-d7 ;repeat these two instructions movem.l d0-d7,(a0) ;as many times as possible ... This will move a block of 16 words from memory into the 16 pallette registers. Although this is the fastest way to update the pallette registers, it has a problem. There is a delay while the registers are being reloaded. This will create problems for a program that has to update the "colorlist" (as I shall call it), because after all this colorlist must at some time be created by some program. It would make it much simpler to alter this colorlist if the pallette registers were being updated at a regular pace. But then again he may be looking for the fastest way and not the simplest way. Now that I have attracted your attention, with the above (interesting?) item, I would like to mention something else (not quite as interesing?), the absolute short addressing mode. This fantastic little addressing mode seems to be quite neglected by nearly every ST programmer I have known. I don't know why, even the TOS programmers didn't use it. Instead of having a 32 bit field to specify the address, (as in absolute long), absolute short uses a sign extended 16 bit field, which means that instructions using this addressing mode are shorter and, of course, faster! This allows access to the last 32K of memory (the input/output locations) and the first 32K of memory (the exception vectors etc.) at increased speed. Your wish to use either absolute long or absolute short is communicated to the assembler by terminating the address with a ".L" or ".W", respectively. Absolute long is usually the default, so if you want absolute short you must specify it, by adding the ".W". Have a look at the following instructions, their patterns in memory and their execution times:- move.l $40,d0 2039 0000 0040 20 clock periods move.b $ffff8260,d0 1039 ffff 8260 16 clock periods You can see that the second word in both of the patterns is really redundant. If the above two instructions are converted to absolute short... move.l $40.w,d0 2038 0040 16 clock periods move.b $8260.w,d0 1038 8260 12 clock periods A word is saved in each instruction, as are four clock periods. So you can see, when you are accessing the first or last 32K of memory it is much more efficiemt to use absolute short. NB When using symbols with absolute short, you must usually place the symbol name in brackets eg screen_resolution equ $8260 move.b (screen_resolution).w,d0 1038 8260 Another few time and space saving instructions that I regularly use, and lots of other people seem to be ignorant of are:- 1. When clearing all 32 bits of a data register use moveq #0,d0 4 clock periods and not clr.l d0 6 clock periods 2. The fastest way to clear all 32 bits of an address register is sub.l a0,a0 8 clock periods Although it is possible to clear an address register with lea 0.w,a0 8 clock periods in just the same time, the former takes up only 1 word in memory, while the later takes 2. 3. When adding to an address register, use lea <16 bit displacement>(a0),a0 8 clock periods ie lea 29000(a0),a0 lea -346(a0),a0 whenever possible. Although this method only allows a 16 bit number to be added to the address register (values in the range -32768 to +32767), it is considerably faster than the equivalent add.l #29000,a0 16 clock periods sub.l #346,a0 16 clock periods Remember that the 16 bit value added is sign extended to 32 bits before the addition is performed, and so the upper word of the address register will be updated (this is why I compared it to "add.l" and "sub.l" and not "add.w"), eg a0 before | Instruction | length (words) | a0 after -----------+-----------------------+----------------+---------- $6fff0 | lea $200(a0),a0 | 2 | $701f0 $6fff0 | add.w #$200,a0 | 2 | $601f0 $6fff0 | add.l #$200,a0 | 3 | $701f0 As you can see the load effective address is much more versatile than the add. I hope the above information proves useful, to you. If you have any comments or find any inaccuracies in the above I'd be glad to hear from you. Mark Powell ******************************************************************************** "...there's no success JANET unpowell@uk.ac.lis.csvax like failure and UUCP {backbone}!mcvax!ukc!mupsy!lis-cs!unpowell failure's no success ARPA unpowell%csvax.lis.ac.uk@nss.cs.ucl.ac.uk at all..." B.Dylan ********************************************************************************