krueger@tms390.Micro.TI.COM (Steve Krueger) (02/01/90)
In article <207@rangkom.MY>, norazman@rangkom.MY (Noor Azman Wathooth) writes: > > SPARC & R2000 Exception Return > ------------------------------ > > The SPARC manual gives the following instruction sequence for exception return: > jmpl %17, %0 (old PC --> trapped instruction) > rett %18 (old nPC) > (Note: In this case the trapped instruction will be re-executed) > > My question is when does rett restores the S-field of the PSR from PS? > > However, looking at the pipestage activities, the return > instruction is being fetched when rett is in the decode stage. Does this > mean that rett is treated specially from other instructions and is recognized > very early in its decode stage (or late in its fetch stage)? > (I guess this is more of implementation related question - perhaps one of the > SPARC designers can help me on this) > > Thanks. I'm not quite a SPARC designer but have high familiarity with both SPARC and pipelined processor design. Besides, some of my friends are SPARC designers :-) This is all from the ISP descriptions in the SPARC Architecture Manual (version 7, which seems to have been widely distrubuted by Sun). As you said, the sequence from returning from a trap is (from SAMv7, pg 99): jmpl %17, %0 ! old PC rett %18 ! old nPC or jmpl %18, %0 ! old nPC rett %18 + 4 ! old nPC + 4 The SPRAC Architecture Manual has an appendix with an ISP description of the architecture which helps to clarify the meaning of the RETT instruction. Implementations may have significantly different timing but should adhere to the meaning in the ISP description. So lets look at the two instruction sequence JMPL, RETT. Leaving out lots of stuff and concentrating on only the most relevant part, the ISP for JMPL is (SAMv7, pg 147): PC <- nPC; nPC <- jump_address; Then Instruction fetch happens (SAMv7, pg 126): next; instruction <- memory_read(addr_space, PC); Here "next" approximately means "clock". Here PC is address of the RETT instruction. "addr_space" is a macro to select address space 8 or 9 based on the S bit. Here we're in a trap handler so it must be 1, making address space 9. next; dispatch_instruction Which is a macro which decodes the instruction and gets to its routine, in this case RETT. The relevant parts of RETT are (SAMv7, pg 147): lots of checking left out here PC <- nPC; nPC <- address; CWP <- new_cwp; S <- pS Then we have instruction fetch again which does: next; instruction <- memory_read(addr_space, PC); This time addr_space is set based on the new S (which was pS during the execution of the trap handler) so we get the correct instruction on returning from the trap handler. This is the behavior specified in the ISP of Appendix C of the SPARC Architecture Manual. Pipelined machines will have some difficulty in implementing this faithfully but must if they are to be architecture compliant. Now, looking at an implementation for a simple 4 stage pipeline, the timing of the two instruction return sequence and the two instructions returned to might look like: JMPL |fetch|read |exec |write| RETT |fetch|read |exec |write| tr-PC |fetch|read |exec |write| tr-nPC |fetch|read |exec |write| Clearly, the S <- pS must be executed before the fetch of tr-PC so that it gets the right address space. Also, JMPL must have computed its jump address before the fetch of tr-PC (and RETT must of computed it's jump address before the fetch of tr-nPC). If the implementor decides to compute the jump address in in the exec stage, then new PC isn't available until at least one clock later than shown above. Further if the store into PC is handled the same as other register stores it will be performed in the write stage so the timing looks like: JMPL |fetch|read |exec |write| RETT |fetch|read |exec |write| bubble(nop) |nop |nop | tr-PC |fetch|read |exec |write| tr-nPC |fetch|read |exec |write| Does this allow the fetch of tr-PC to get the restored S bit? Maybe, but not if the S bit is updated in the same way as other registers during the write stage. In this case, an additional bubble is required to wait for the updated S bit. The next instruction would not be allowed to fetch until the RETT had completed it's write stage. The timing becomes: JMPL |fetch|read |exec |write| RETT |fetch|read |exec |write| bubble(nop) |nop |nop |nop | tr-PC |fetch|read |exec |write| tr-nPC |fetch|read |exec |write| Now, that's three pipeline bubbles and that's not really very fast. So at the cost of a little cleverness and complexity, how fast could RETT be with a four stage pipeline? I suggest two tricks: forward the new PC from JMPL's exec stage directly to the fetch stage of tr-PC and restore S during the read stage of RETT. Then only a single pipeline bubble is needed after RETT but due to the JMPL. Then the timing looks like: JMPL |fetch|read |exec |write| RETT |fetch|read |exec |write| bubble(nop) |nop | tr-PC |fetch|read |exec |write| tr-nPC |fetch|read |exec |write| Restoring S during the read stage doesn't hurt any previously started instructions, since they have all calculated any memory addresses by the end of RETT's read stage. The only problem I see which must be dealt with is the possibility of a previous instruction trapping or of the RETT instruction trapping. Again there is a solution (trick) which costs a little complexity of affecting the copy of S used for fetching and deferring the setting of the "real" S until the write stage of the RETT instruction where trapping is handled normally. BTW, the Cypress 7C601 User's Guide shows both JMPL and RETT as two cycle instructions (page 2-31). Aren't pipelines fun! Your mileage may vary. A different pipeline than the one I have imagined here will need different fixes and patches to get a high performance solution, but the issues I have considered here are usually what has to be considered. - Steve Krueger krueger@micro.ti.com SPARC Applications Texas Instruments, Houston Texas USA