csuwr@warwick.ac.uk (Derek Hunter) (05/16/91)
I was trying to cut down the number of labels my C compiler produces, (having finally allowed the thing to access globals beyond the 4095 range), and I (re)invented this: You can do Ldr Rn,VERY_FAR_AWAY ; with: Ldr Rn,[PC] Ldr Rn,[PC,Rn] DCD VERY_FAR_AWAY-P% ; if V_F_A preceeds this code Ldr Rn,[PC] Ldr Rn,[PC,-Rn] ; (You /can/ do -Rn, can't you?) DCD P%-VERY_FAR_AWAY ; if it doesn't . . . and they are still relative addresses cunningly enough. (In fact, I think they exceed the addressing space!) 32 bit immediate constants can be read with Ldr Rn,[PC] Bic Rn,Rn,# (( number >> 28 ) EOR 15) << 28 DCD number OR &F0000000 or equivalent, (but Bic impresses people, because no-one knows what it does, and those shift-28s are luvverly). (If you didn't guess, the DCDs are all -ve and the top nybble is set to &F making the DCD value pretend to be an op code with the NeVer condition set so the processor can run through it without so much as an `undefined instruction', unless someone's been *really* dense and told it to complain on UndefinedNV. Blast! That's potentially spoiled my whole strategy! Is it OK for top /byte/ = &FF?) I don't know whether this is well known and/or despised, but it seemed like quite a good trick at the time. My main point of interest is this: On an ARM 3, would the DCD be read into cache in an s cycle during the final stage of the Ldr Rn,[PC], or does it take an n cycle all of its very own? Is this nice on an ARM 3's cache? Is it nice at all? Was this the intentional use of NV? Has Acorn used it? Is the UndefinedNV really a problem? Will this latter be supported in future releases of the hardware? - For a full list of questions pertaining to Life, the Universe and rugby socks, write to oracle@iuvax.cs.indiana.edu with `help' in the subject line.
nbvs@cl.cam.ac.uk (Nicko van Someren) (05/16/91)
In article <+|Q_L||@warwick.ac.uk> csuwr@warwick.ac.uk (Derek Hunter) writes: >My main point of interest is this: On an ARM 3, would the DCD be read into > cache in an s cycle during the final stage of the Ldr Rn,[PC], or does it > take an n cycle all of its very own? > > Is this nice on an ARM 3's cache? > > Is it nice at all? > > Was this the intentional use of NV? > > Has Acorn used it? > > Is the UndefinedNV really a problem? > > Will this latter be supported in future releases of the hardware? Indeed, on an ARM3 the constant will get cached bacause the cache always reads in 'lines' of four words. Since by the time the LDR instruction is being the other end of the pipeline will be being loaded with the NeVer instruction the data must be in the cache and no extra external cycles will happen on an ARM3. IMHO it is nice on a cache, nice at all, a useful use of NV but I don't know if Acorn have used it. I think if you read the chip spec carefully you will find aborts can only occur on instructions with valid condition codes. Having said that, I bet they will remove the NV option on future chips just to wind us up. After all, if you look at the statistics compilers never use that option so they might as well take it out! Nicko +-----------------------------------------------------------------------------+ | Nicko van Someren, nbvs@cl.cam.ac.uk, (44) 223 358707 or (44) 860 498903 | +-----------------------------------------------------------------------------+
Gavin.Flower@comp.vuw.ac.nz (Gavin Flower) (05/17/91)
Have read in several places, that the condition code NV is *NOT* forward compatible. Specifically that Acorn have reserved the right to reuse it for something else, at some future stage! To the best of my knowledge it is still ok with ARM3. ****could someone at Acorn confirm this, also could they suggest what it might be earmarked for? -Gavin -- The main "user" of well brought up, and educated, children is the community at large. So if you really believe in "user pays", charge the correct users - stop overloading parents with financial penalties. ******* These comments have no known correlation with dept. policy! *******
kers@hplb.hpl.hp.com (Chris Dollin) (05/17/91)
Derek Hunter gives a cunning trick for loading far globals and big constants: You can do Ldr Rn,VERY_FAR_AWAY ; with: Ldr Rn,[PC] Ldr Rn,[PC,Rn] DCD VERY_FAR_AWAY-P% ; if V_F_A preceeds this code I'd generate LDR Rn, nnn[PC] (or however the s*d assembler notates it) LDR Rm, mmm[Rn] where nnn[PC] holds a pointer to the item you need. Of course this item is reusable for each far reference (so long as you stay ``near enough'' in the code). It's not relocatable; if it were, you'd have to do LDR Rm,[PC,Rn] as Derek does. However, if you *can* get away with fixed (or self-updating) code, then the former method allows you to group far globals together and access them using the one pointer (perhaps one per procedure). Derek adds: 32 bit immediate constants can be read with Ldr Rn,[PC] Bic Rn,Rn,# (( number >> 28 ) EOR 15) << 28 DCD number OR &F0000000 I'd definitely use an out-of-line constant in a table at the procedure head. It reduces the code to one instruction at the cost of a store reference (which I think Derek's costs in any case, but I'm not familiar enough with the timing details), and the constant is sharable. Of course, if you're prepared to spend 3 instructions on building the constant, then anything composable from 3 8-bit fields can be done by: MOV Rn, #XXX OR Rn, Rn, #YYY OR Rn, Rn, #ZZZ with XXX, YYY, ZZZ being suitable components of the full value. Do Acorn's object formats permit this to be expressed when the full value is some constant not known until link time? -- Regards, Kers. | "You're better off not dreaming of the things to come; Caravan: | Dreams are always ending far too soon."
athomas (Alasdair Thomas) (05/17/91)
Derek Hunter (csuwr@warwick.ac.uk) asks: > Was this the intentional use of NV? > Is the UndefinedNV really a problem? The intention is to redefine NV class of instructions in the future to enhance ARM's instruction set - there are no firm plans yet as to how that instruction space will be redefined, but if you wish your code to run on future generations of ARM, you should _not_ use the NV instructions in your code. [Note: It is recommended that the instruction "MOV R0,R0" be used as a general purpose NOP.] Whilst on the topic of programming the ARM, the document below summarises the instruction sequences to be avoided. -- Alasdair Thomas Advanced RISC Machines Ltd. ******************************************************************** IMPORTANT RULES FOR ARM CODE WRITERS ==================================== Date: 17/5/91 Issue: 2.5 Every effort has been made to ensure that the information in this document is true and correct at the date of issue. Products described in this document, however, are subject to continuous development and improvements and Advanced RISC Machines Ltd (and other contributors) reserve the right to change their specifications at any time. Advanced RISC Machines Ltd cannot accept liability for any loss or damage arising from the use of any information or particulars in this document. ================ = Introduction = ================ The ARM processor family uses Reduced Instruction Set (RISC) techniques to maximise performance; as such, the instruction set allows some instructions and code sequences to be constructed that will give rise to unexpected (and potentially erroneous) results. These cases must be avoided by all machine code writers and generators if correct program operation across the whole range of ARM processors is to be obtained. In order to be upwards compatible with future versions of the ARM processor family NEVER use any of the undefined instruction formats: both those shown in the manual as "Undefined" which the processor traps AND those which are not shown in the manual and which don't trap (for example a Multiply instruction where bit 5 or 6 of the instruction is set). In addition the "NV" (never executed) instruction class should not be used [It is recommended that the instruction "MOV R0,R0" be used as a general purpose NOP]. This document lists the instruction code sequences to be avoided. It is *STRONGLY* recommended that you take the time to familiarise yourself with these cases because some will only fail under particular circumstances which may not arise during testing. ============================================ = Instructions and code sequences to avoid = ============================================ The instructions and code sequences are split into a number of categories. Each category starts with a recommendation or warning, and indicates which of the two main ARM variants (ARM2, ARM3) it applies to. The text then goes on to explain the conditions in more detail and to supply examples where appropriate. Unless a program is being targeted SPECIFICALLY for a single version of the ARM processor family, all of these recommendations should be adhered to. 1) TSTP/TEQP/CMPP/CMNP: Changing mode ------------------------------------- #################################################################### # When the processor's mode is changed by altering the mode bits # # in the PSR using a data processing operation, care must be taken # # not to access a banked register (R8-R14) in the following # # instruction. Accesses to the unbanked registers (R0-R7,R15) are # # safe. # #################################################################### # Applicability: ARM2 # #################################################################### The following instructions are affected, but note that mode changes can only be made when the processor is in a non-user mode:- TSTP Rn,<Op2> TEQP Rn,<Op2> CMPP Rn,<Op2> CMNP Rn,<Op2> These are the only operations that change all the bits in the PSR (including the mode bits) without affecting the PC (thereby forcing a pipeline refill during which time the register bank select logic settles). e.g. Assume processor starts in Supervisor mode in each case:- a) TEQP PC,#0 MOV R0,R0 SAFE: NOP added between mode change and access ADD R0,R1,R13_usr to a banked register (R13_usr). b) TEQP PC,#0 ADD R0,R1,R2 SAFE: No access made to a banked register c) TEQP PC,#0 ADD R0,R1,R13_usr *FAILS*: Data NOT read from Register R13_usr! The safest default is always to add a NOP (e.g. MOV R0,R0) after a mode changing instruction; this will guarantee correct operation regardless of the code sequence that follows it. 2) LDM/STM: Forcing transfer of the user bank (Part 1) ------------------------------------------------------ ################################################################### # Don't use write back when forcing user bank transfer in LDM/STM # ################################################################### # Applicability: ARM2,ARM3 # ################################################################### For STM instructions the S bit is redundant as the PSR is always stored with the PC whenever R15 is in the transfer list. In user mode programs the S bit is ignored, but in other modes it has a second interpretation. S=1 is used to force transfers to take values from the user register bank instead of the current register bank. This is useful for saving the user state on process switches. Similarly, in LDM instructions the S bit is redundant if R15 is not in the transfer list. In user mode programs, the S bit is ignored, but in non-user mode programs where R15 is not in the transfer list, S=1 is used to force loaded values to go to the user registers instead of the current register bank. In both cases where the processor is in a non-user mode and transfer to/from the user bank is forced by setting the S bit, write back of the base will also be to the user bank though the base will be fetched from the current bank. Therefore don't use write back when forcing user bank transfer in LDM/STM. e.g. In all cases, the processor is assumed to be in a non-user mode and <Rlist> is assumed not to include R15:- STMxx Rn!,<Rlist> SAFE: Storing non-user registers with write back to the non-user base register LDMxx Rn!,<Rlist> SAFE: Loading non-user registers with write back to the non-user base register STMxx Rn,<Rlist>^ SAFE: Storing user registers, but no base write-back STMxx Rn!,<Rlist>^ *FAILS*: Base fetched from non-user register, but written back into user register LDMxx Rn!,<Rlist>^ *FAILS*: Base fetched from non-user register, but written back into user register 3) LDM: Forcing transfer of the user bank (Part 2) -------------------------------------------------- ###################################################################### # When loading user bank registers with an LDM in a non-user mode, # # care must be taken not to access a banked register (R8-R14) in the # # following instruction. Accesses to the unbanked registers # # (R0-R7,R15) are safe. # ###################################################################### # Applicability: ARM2,ARM3 # ###################################################################### Because the register bank switches from user mode to non-user mode during the first cycle of the instruction following an "LDM Rn,<Rlist>^", an attempt to access a banked register in that cycle may cause the wrong register to be accessed. e.g. In all cases, the processor is assumed to be in a non-user mode and <Rlist> is assumed not to include R15:- LDM Rn,<Rlist>^ ADD R0,R1,R2 SAFE: Access to unbanked registers after LDM^ LDM Rn,<Rlist>^ MOV R0,R0 SAFE: NOP inserted before banked register used ADD R0,R1,R13_svc following an LDM^ LDM Rn,<Rlist>^ ADD R0,R1,R13_svc *FAILS*: Accessing a banked register immediately after an LDM^ returns the wrong data! ADR R14_svc, saveblock LDMIA R14_svc, {R0 - R14_usr}^ LDR R14_svc, [R14_svc,#15*4] *FAILS*: Banked base register (R14_svc) MOVS PC, R14_svc used immediately after the LDM^ ADR R14_svc, saveblock LDMIA R14_svc, {R0 - R14_usr}^ MOV R0,R0 SAFE: NOP inserted before banked LDR R14_svc, [R14_svc,#15*4] register (R14_svc) used MOVS PC, R14_svc NOTE: The ARM2 and ARM3 processors *usually* give the expected result, but cannot be guaranteed to do so under all circumstances. Therefore this code sequence should be avoided in future. 4) SWI/Undefined Instruction trap interaction --------------------------------------------- ###################################################################### # Care must be taken when writing an undefined instruction handler # # to allow for an unexpected call from a SWI instruction. # # The erroneous SWI call should be intercepted and redirected to the # # software interrupt handler # ###################################################################### # Applicability: ARM2 # ###################################################################### The implementation of the CDP instruction on ARM2 causes a Software Interrupt (SWI) to take the Undefined Instruction trap if the SWI was the next instruction after the CDP. e.g. SIN F0,F1 SWI &11 *FAILS*: ARM2 will take the undefined instruction trap instead of software interrupt trap. All Undefined Instruction handler code should check the failed instruction to see if it is a SWI, and if so pass it over to the software interrupt handler. 5) Undefined instruction/Prefetch abort trap interaction -------------------------------------------------------- ###################################################################### # Care must be taken when writing the Prefetch abort trap handler to # # allow for an unexpected call due to an undefined instruction # ###################################################################### # Applicability: ARM2,ARM3 # ###################################################################### When an undefined instruction is fetched from the last word of a page, where the next page is absent from memory, the undefined instruction will cause the undefined instruction trap to be taken, and the following (aborted) instructions will cause a prefetch abort trap. One might expect the undefined instruction trap to be taken first, then the return to the succeeding code will cause the abort trap. In fact the prefetch abort has a higher priority than the undefined instruction trap, so the prefetch abort handler is entered _before_ the undefined instruction trap, indicating a fault at the address of the undefined instruction (which is in a page which is actually present). A normal return from the prefetch abort handler (after loading the absent page) will cause the undefined instruction to execute and take the trap correctly. However the indicated page is already present, so the prefetch abort handler may simply return control, causing an infinite loop to be entered. Therefore, the prefetch abort handler should check whether the indicated fault is in a page which is actually present. If so, the above condition must be present and so control should be passed to the undefined instruction handler. This will restore the expected sequential nature of the execution sequence; a normal return from the undefined instruction handler will cause the next instruction to be fetched (which will abort), the prefetch abort handler will be reentered (with an address pointing to the absent page), and execution can proceed normally. ======================== = Other points to note = ======================== This section highlights some obscure cases of ARM operation which should be borne in mind when writing code. 1) Use of R15 ------------- ************************************************************************* * WARNING: When the PC is used as a destination, operand, base or shift * * register, different results will be obtained depending on * * the instruction and the exact usage of R15 * ************************************************************************* * Applicability: ARM2,ARM3 * ************************************************************************* Full details of the value derived from or written into R15+PSR for each instruction class is given in the datasheet. Care must be taken when using R15 because small changes in the instruction can yield significantly different results. e.g. Consider data operations of the type:- <opcode>{cond}{S} Rd,Rn,Rm or <opcode>{cond}{S} Rd,Rn,Rm,<shiftname> Rs a) When R15 is used in the Rm position, it will give the value of the PC together with the PSR flags. b) When R15 is used in the Rn or Rs positions, it will give the value of the PC without the PSR flags (PSR bits replaced by zeros). MOV R0,#0 ORR R1,R0,R15 ; R1:=PC+PSR (bits 31:26,1:0 reflect PSR flags) ORR R2,R15,R0 ; R2:=PC (bits 31:26,1:0 set to zero) NOTE: The relevant instruction description in the ARM datasheets should be consulted for full details of the behaviour of R15. 2) STM: Inclusion of the base in the register list -------------------------------------------------- *********************************************************************** * WARNING: In the case of a STM with writeback that includes the base * * register in the register list, the value of the base * * register stored depends upon its position in the register * * list * *********************************************************************** * Applicability: ARM2,ARM3 * *********************************************************************** During a STM, the first register is written out at the start of the second cycle of the instruction. When writeback is specified, the base is written back at the end of the second cycle. A STM which includes storing the base with the base as the first register to be stored will therefore store the unchanged value, whereas with the base second or later in the transfer order, it will store the modified value. e.g. MOV R5,#&1000 STMIA R5!,{R5-R6} ; Stores value of R5=&1000 MOV R5,#&1000 STMIA R5!,{R4-R5} ; Stores value of R5=&1008 3) MUL/MLA: Register restrictions --------------------------------- **************************************************** * Given MUL Rd,Rm,Rs * * or MLA Rd,Rm,Rs,Rn * * * * Then Rd & Rm must be different registers * * Rd must not be R15 * **************************************************** * Applicability: ARM2,ARM3 * **************************************************** Due to the way that Booth's algorithm has been implemented, certain combinations of operand registers should be avoided. (The assembler will issue a warning if these restrictions are overlooked.) The destination register (Rd) should not be the same as the Rm operand register, as Rd is used to hold intermediate values and Rm is used repeatedly during the multiply. A MUL will give a zero result if Rm=Rd, and a MLA will give a meaningless result. The destination register (Rd) should also not be R15. R15 is protected from modification by these instructions, so the instruction will have no effect, except that it will put meaningless values in the PSR flags if the S bit is set. All other register combinations will give correct results, and Rd, Rn and Rs may use the same register when required. 4) LDM/STM: Address Exceptions ------------------------------ ************************************************************************ * WARNING: Illegal addresses formed during a LDM or STM operation will * * not cause an address exception * ************************************************************************ * Applicability: ARM2,ARM3 * ************************************************************************ Only the address of the first transfer of a LDM or STM is checked for an address exception; if subsequent addresses over- or under-flow into illegal address space they will be truncated to 26 bits but will not cause an address exception trap. e.g. Assume processor is in a non-user mode & MEMC being accessed:- {these examples are very contrived} MOV R0,#&04000000 ; R0=&04000000 STMIA R0,{R1-R2} ; Address exception reported (base address illegal) MOV R0,#&04000000 SUB R0,R0,#4 ; R0=&03FFFFFC STMIA R0,{R1-R2} ; No address exception reported (base address legal) ; code will overwrite data at address &00000000 NOTE: The exact behaviour of the system depends upon the memory manager to which the processor is attached; in some cases, the wraparound may be detected and the processor aborted. 5) LDC/STC: Address Exceptions ------------------------------ ************************************************************************ * WARNING: Illegal addresses formed during a LDC or STC operation will * * not cause an address exception (affects LDF/STF) * ************************************************************************ * Applicability: ARM2,ARM3 * ************************************************************************ The coprocessor data transfer operations act like STM and LDM with the processor generating the addresses and the coprocessor supplying/reading the data. As with LDM/STM, only the address of the first transfer of a LDM or STM is checked for an address exception; if subsequent addresses over- or under-flow into illegal address space they will be truncated to 26 bits but will not cause an address exception trap. Note that the floating point LDF/STF instructions are forms of LDC & STC! e.g. Assume processor is in a non-user mode & MEMC being accessed:- {these examples are very contrived} MOV R0,#&04000000 ; R0=&04000000 STC CP1,CR0,[R0] ; Address exception reported (base address illegal) MOV R0,#&04000000 SUB R0,R0,#4 ; R0=&03FFFFFC STFD F0,[R0] ; No address exception reported (base address legal) ; code will overwrite data at address &00000000 NOTE: The exact behaviour of the system depends upon the memory manager to which the processor is attached; in some cases, the wraparound may be detected and the processor aborted. 6) LDC: Data transfers to a coprocessor fetch more data than expected --------------------------------------------------------------------- *************************************************************************** * Data to be transferred to a coprocessor with the LDC instruction should * * never be placed in the last word of an addressable chunk of memory, nor * * in the word of memory immediately preceding a read-sensitive memory * * location * *************************************************************************** * Applicability: ARM3 * *************************************************************************** Due to the pipelining introduced into the ARM3 coprocessor interface, an LDC operation will cause one extra word of data to be fetched from the internal cache or external memory by ARM3 and then discarded; if the extra data is fetched from an area of external memory marked as cacheable, a whole line of data will be fetched and placed in the cache. A particular case in point is that an LDC whose data ends at the last word of a memory page will load and then discard the first word (and hence the first cache line) of the next page. A minor effect of this is that it may occasionally cause an unnecessary page swap in a virtual memory system. The major effect of it is that (whether in a virtual memory system or not), the data for an LDC should never be placed in the last word of an addressable chunk of memory: the LDC will attempt to read the immediately following non-existent location and thus produce a memory fault. e.g. Assume processor is in a non-user mode, FPU hardware attached and MEMC being accessed:- {this example is very contrived} MOV R13,#&03000000 ; R13=Address of I/O space STFD F0,[R13,#-8]! ; Store F.P. register 0 at top of physical memory ; (two words of data transferred) LDFD F1,[R13],#8 ; Load F.P. register 1 from top of physical memory ; but THREE words of data are transferred, and the ; third access will read from I/O space which may be ; read sensitive! *** BEWARE ***
john@acorn.co.uk (John Bowler) (05/17/91)
In article <+|Q_L||@warwick.ac.uk> csuwr@warwick.ac.uk (Derek Hunter) writes: >I was trying to cut down the number of labels my C compiler produces, > (having finally allowed the thing to access globals beyond the 4095 range), > and I (re)invented this: > >You can do Ldr Rn,VERY_FAR_AWAY ; with: > > Ldr Rn,[PC] > Ldr Rn,[PC,Rn] > DCD VERY_FAR_AWAY-P% ; if V_F_A preceeds this code > > Ldr Rn,[PC] > Ldr Rn,[PC,-Rn] ; (You /can/ do -Rn, can't you?) Yes, this is valid. > DCD P%-VERY_FAR_AWAY ; if it doesn't > > . . . and they are still relative addresses cunningly enough. > (In fact, I think they exceed the addressing space!) This is three instructions to read an given memory location with an offset of up to +/- 28 bits (4 bits set to F to give the NV condition code). How about:- ADD Rn, PC, #x LSL 12 ; PC without PSR bits, 8 bit constant ADD Rn, Rn, #y LSL 20 LDR Rn, [Rn, #x] ; 12 bit offset which gives a 28 bit (positive) PC offset, similarly using SUB will allow generation of a negative offset. This sequence of instructions has the advantage that it is faster (it avoids one LDR) on non-ARM3 machines. Also, some offsets can be represented more efficiently - in particular an arbitrary 20 bit offset only requires two instructions. Notice that *neither* approach allows link time relocation - apparently Derek's algorithm would do this, but, in practice, the compiler would not know whether the value was positive or negative, so could not generate the correct LDR instruction. (This is all because of a deficiency in the current AOF and A.OUT object module formats, which do not allow the appropriate relocation forms for the instructions which would be needed.) > >32 bit immediate constants can be read with > > Ldr Rn,[PC] > Bic Rn,Rn,# (( number >> 28 ) EOR 15) << 28 > DCD number OR &F0000000 > > or equivalent, (but Bic impresses people, because no-one knows > what it does, and those shift-28s are luvverly). Again, three instructions will generate any +/-24 bit constant (obviously), and (additionally) a very large number of the others. (Possibly even all of them, given that there are 36 immediate value bits in the three instructions, plus quite a lot of bits corresponding to the selection of different alu instruction types). >My main point of interest is this: On an ARM 3, would the DCD be read into > cache in an s cycle during the final stage of the Ldr Rn,[PC], or does it > take an n cycle all of its very own? The cache is mixed instruction + data, so the LDR causes no memory access other than that which occurs as a result of the instruction loading. > Is this nice on an ARM 3's cache? Yes. > Is it nice at all? Hum. See below. > Was this the intentional use of NV? I don't think so. We have used NV when patching binaries (to remove instructions we don't want :-)) and have recommended its use as a NOOP (after processor mode changes for example). > Has Acorn used it? Not in this way as far as I know. > Is the UndefinedNV really a problem? Currently anythingNV is ignored; for example co-processors don't see the instruction, no instruction decoding takes place (I think...) > Will this latter be supported in future releases of the hardware? NV instructions are hardly ever used. There is a lot of pressure on the instruction space; there are very few slots left in it, yet NV accounts for 1/16 of all the instructions in the ARM instruction set! It seems to me that it is very likely that future developments will use NV instructions in some way, which would cause the above to cease to work. Given that the actual advantage of the suggested code is very small (at most one extra instruction for some very rarely used 32 bit constants) it is probably worth avoiding. John Bowler (jbowler@acorn.co.uk)
hughesmp@vax1.tcd.ie (05/18/91)
In article <+|Q_L||@warwick.ac.uk>, csuwr@warwick.ac.uk (Derek Hunter) writes: > I was trying to cut down the number of labels my C compiler produces, > (having finally allowed the thing to access globals beyond the 4095 range), > and I (re)invented this: > > You can do Ldr Rn,VERY_FAR_AWAY ; with: > > Ldr Rn,[PC] > Ldr Rn,[PC,Rn] > DCD VERY_FAR_AWAY-P% ; if V_F_A preceeds this code > > Ldr Rn,[PC] > Ldr Rn,[PC,-Rn] ; (You /can/ do -Rn, can't you?) > DCD P%-VERY_FAR_AWAY ; if it doesn't Ldr - 4 cycles Ldr - 4 cycles Nop - 1 cycle -------------- 9 cycles Another problem - I'm not sure, but should some lines be... DCD V_F_A-P% ---> DCD V_F_A-P%-4 ? DCD P%-V_F_A ---> DCD P%+4-V_F_A ? As the R15 would be pointing at the instruction following the DCD? (I may be wrong, it could work; I'm not sure) It is also slow - much better is Add Rn,Pc,#(within 4096 of address) \ This is a multi-instruction add - \ several adds to make the full value Ldr Rn,[Rn,#(the error margin)] Worst case here is realistically 6 cycles, possibly 7. This can be implemented for the BASIC assembler as a FN, but it is impossible (without <n>-pass assembly that may never terminate) for the assembler to work out the optimum number of Adds because it may not know the destination on Pass 1, but it must take up the instruction space then... Thus the FN must be implemented as: FNldr(n,a,o) - LDR Rn,a taking up an additional o instructions. > 32 bit immediate constants can be read with > > Ldr Rn,[PC] > Bic Rn,Rn,# (( number >> 28 ) EOR 15) << 28 > DCD number OR &F0000000 Ldr - 4 cycles Bic - 1 cycle Nop - 1 cycle -------------- 6 cycles It is faster to do... MOV Rn,#x AND &FF ORR Rn,Rn,#x AND &FF00 ORR Rn,Rn,#x AND &FF0000 ORR Rn,Rn,#x AND &FF000000 - 4 cycles, and there won't be the problems you speculate on, in possible future CPUs... Again with this, you can optimise it further if you know certain bits of your number will be 0; it might be faster to do... MOV Rn,#x AND &F00000F:ORR Rn,Rn,#x AND &FF00 \ 2 cycles depending on your numbers; write a FN that will work out the most optimum code, if you are using the BASIC assembler - then you can just say FNmov(n,x) and it will do the fastest possible implementation... Such a FN would be fairly trivial to implement. Incidentally, we have a BASIC library which implements all these FNs, including FNadr (same limitations as FNldr apply), FNsp (assign space ; equivalent to ]P%+=sp:[OPT pass), FNfi (assigns space to load in the given file, and loads the file in on pass 2) , and a fair few others methinks... We'll post them to c.s.a. if there would be any interest... One limitation, because there is no standard 'pass' instruction, we assume it is called 'pass' - this is easily changed (Although really, you should follow our example ;-) - a function FNpass is used, which we have as... DEFFNpass=pass Just change the pass to whatever you use... Merlin, --SICK-- You suffer... But why?
csuwr@warwick.ac.uk (Derek Hunter) (05/18/91)
In article <+|Q_L||@warwick.ac.uk> csuwr@warwick.ac.uk (I) write . . .
. . . about these 32 bit loads.
I'm afraid I got it a bit wrong, the DCD for the extended loads should
add four to P% before anything is done to it.
ie DCD VERY_FAR_AWAY-(P%+4)
or DCD (P%+4)-VERY_FAR_AWAY
In case the NV code has less precedance than an illegal instruction code,
(sorry, I haven't tested it yet), you can set the top byte to 0xFF which
I think takes it into the SWI range of instructions, where there should
be no possibility of illegal instructions existing, merely lots of never
executed undefined SWIs. (Not the same thing at all, really).
Of course you then have to make the obvious modification to the Bic in
that case, correcting the top byte rather than just the top nybble, but
for current memory levels, the extended offset addressing shouldn't foul
up. By the time it does, we'll be into RISC OS 27. (Any rumours available
about /that/ one Acorn? :-)
- I was typing from memory, sorry I got it wrong folks.
- Derek Hunter csuwr@cu.warwick.ac.uk
csuwr@warwick.ac.uk (Derek Hunter) (05/18/91)
I seem to be posting quite a lot at the moment. This is probably my last thing on this topic: It was nice to have had an original idea, shame it wasn't an optimal solution and wo'n't work with proposed machines. As far as knowing which Ldr to produce, all my globals come before any program code, so they're all -ve offset. Is there as yet any notes on a correction to the aof deficiency? Is there any information further to the PRMs in this area I could grab hold of? For the moment, I'm going to leave this system in my compiler because it works with the current machines, and because the BASIC C compiler is going to be retired once I've rewritten it in C, (without all the design features (read: `design bugs') that I `forget' to mention in my Final Year Project Report), and I can snaffle John's suggestions into this latter. ------ If anyone's interested in the final compiler, it should code to m/c or Basic assembly as RQ'd (I haven't got the ARM assembler) with a linker which will work out what needs recompiling a la `make', using the #includes (and #pragmas for the extra info it'll probably need when someone out there points out why .h and .o timestamps are not sufficient). It will also have a nicer interface to RMA moduling than Norcroft's (unless I've been mislead about how nasty that is), and will make those interrupt surrounds for the routines that you're all itching to write much easier. I also want to WIMP it. (It currently has a prettier swi interface too, but that might have to change when I do register colouring. Either that, or a file with the in/output register information.) I doubt it will ever be as efficient/fast as Norcroft's thing, but that can only improve, and it /will/ only be about 1/3..1/4 as much (Shareware). - I hope sh/w plugs don't transgress NETiquette. - Derek Hunter csuwr@cu.warwick.ac.uk I thought I might and I tried, but I stopped when I found out I couldn't.
john@acorn.co.uk (John Bowler) (05/21/91)
In article <1991May16.200200.24636@comp.vuw.ac.nz> gavin@comp.vuw.ac.nz (Gavin Flower) writes: > >Have read in several places, > that the condition code NV >is *NOT* forward compatible. > >Specifically that Acorn have >reserved the right to reuse >it for something else, at >some future stage! I was careful not to say this, only to point out that it is a very obvious thing to want to reuse, however Alasdair Thomas's messages seems unambiguous to me. >To the best of my knowledge >it is still ok with ARM3. >****could someone at Acorn >confirm this, also could >they suggest what it might >be earmarked for? NV works as specified on ARM2 and ARM3! Whether and how NV is (re)used is up to ARM Ltd (*not* Acorn Computers Ltd, although I would hope Acorn has some influence :-). The body of Alasdair's message seems to correspond to the rules which we attempt to follow when generating code which will work on all processors. Certainly I (personally) would suggest that you attempt to follow those rules both within hand-written ARM assembler and within compiler generated code. In article <7~R_RA`@warwick.ac.uk> csuwr@warwick.ac.uk (Derek Hunter) writes: > Is there as yet any notes on a correction to the aof deficiency? The deficiency would require extensions to AOF to allow use of a PC relative symbol value (currently only allowed in the special context of B/BL relocations if I remember correctly) and the insertion of bits derived from such a value into bit fields in a word. I don't see this being fixed on its own - I would anticpate a more widespread review of AOF. Obviously since I am commenting on this I don't know about any current developments in this area :-). John Bowler (jbowler@acorn.co.uk)