[comp.arch] SPARC & R2000 Exception Return

krueger@tms390.Micro.TI.COM (Steve Krueger) (02/01/90)
In article <207@rangkom.MY>, norazman@rangkom.MY (Noor Azman Wathooth) writes:
>
>                       SPARC & R2000 Exception Return
>      	              ------------------------------
> 
> The SPARC manual gives the following instruction sequence for exception return:
> 	       	    jmpl  %17, %0    (old PC --> trapped instruction)
> 		    rett  %18        (old nPC)
> 	(Note: In this case the trapped instruction will be re-executed)
> 
> My question is when does rett restores the S-field of the PSR from PS?
>
> However, looking at the pipestage activities, the return
> instruction is being fetched when rett is in the decode stage.  Does this
> mean that rett is treated specially from other instructions and is recognized
> very early in its decode stage (or late in its fetch stage)?
> (I guess this is more of implementation related question - perhaps one of the
> SPARC designers can help me on this)
> 
> Thanks.

I'm not quite a SPARC designer but have high familiarity with both
SPARC and pipelined processor design.  Besides, some of my friends are
SPARC designers :-)

This is all from the ISP descriptions in the SPARC Architecture Manual
(version 7, which seems to have been widely distrubuted by Sun).

As you said, the sequence from returning from a trap is (from SAMv7,
pg 99):

	jmpl	%17, %0		! old PC
	rett	%18		! old nPC

or

	jmpl	%18, %0		! old nPC
	rett	%18 + 4		! old nPC + 4


The SPRAC Architecture Manual has an appendix with an ISP description
of the architecture which helps to clarify the meaning of the RETT
instruction.  Implementations may have significantly different timing
but should adhere to the meaning in the ISP description.  So lets look
at the two instruction sequence JMPL, RETT.

Leaving out lots of stuff and concentrating on only the most relevant
part, the ISP for JMPL is (SAMv7, pg 147):

	PC <- nPC;
	nPC <- jump_address;

Then Instruction fetch happens (SAMv7, pg 126):

	next;
	instruction <- memory_read(addr_space, PC);

Here "next" approximately means "clock".  Here PC is address of the
RETT instruction.  "addr_space" is a macro to select address space 8
or 9 based on the S bit.  Here we're in a trap handler so it must be
1, making address space 9.

	next;
	dispatch_instruction

Which is a macro which decodes the instruction and gets to its
routine, in this case RETT.  The relevant parts of RETT are (SAMv7, pg
147):

lots of checking left out here

	PC <- nPC;
	nPC <- address;
	CWP <- new_cwp;
	S <- pS

Then we have instruction fetch again which does:

	next;
	instruction <- memory_read(addr_space, PC);

This time addr_space is set based on the new S (which was pS during
the execution of the trap handler) so we get the correct instruction
on returning from the trap handler.

This is the behavior specified in the ISP of Appendix C of the SPARC
Architecture Manual.  Pipelined machines will have some difficulty in
implementing this faithfully but must if they are to be architecture
compliant.  

Now, looking at an implementation for a simple 4 stage pipeline, the
timing of the two instruction return sequence and the two instructions
returned to might look like:

JMPL      |fetch|read |exec |write|
RETT            |fetch|read |exec |write|
tr-PC                 |fetch|read |exec |write|
tr-nPC                      |fetch|read |exec |write|

Clearly, the S <- pS must be executed before the fetch of tr-PC so
that it gets the right address space.  Also, JMPL must have computed
its jump address before the fetch of tr-PC (and RETT must of computed
it's jump address before the fetch of tr-nPC).  If the implementor
decides to compute the jump address in in the exec stage, then new PC
isn't available until at least one clock later than shown above.
Further if the store into PC is handled the same as other register
stores it will be performed in the write stage so the timing looks
like:

JMPL      |fetch|read |exec |write|
RETT            |fetch|read |exec |write|
bubble(nop)           |nop  |nop  |
tr-PC                             |fetch|read |exec |write|
tr-nPC                                  |fetch|read |exec |write|

Does this allow the fetch of tr-PC to get the restored S bit?  Maybe,
but not if the S bit is updated in the same way as other registers
during the write stage.  In this case, an additional bubble is
required to wait for the updated S bit.  The next instruction would
not be allowed to fetch until the RETT had completed it's write stage.
The timing becomes:

JMPL      |fetch|read |exec |write|
RETT            |fetch|read |exec |write|
bubble(nop)           |nop  |nop  |nop  |
tr-PC                                   |fetch|read |exec |write|
tr-nPC                                        |fetch|read |exec |write|


Now, that's three pipeline bubbles and that's not really very fast.
So at the cost of a little cleverness and complexity, how fast could
RETT be with a four stage pipeline?  I suggest two tricks:
forward the new PC from JMPL's exec stage directly to the fetch stage
of tr-PC and restore S during the read stage of RETT.  Then only a
single pipeline bubble is needed after RETT but due to the JMPL.  Then
the timing looks like: 

JMPL      |fetch|read |exec |write|
RETT            |fetch|read |exec |write|
bubble(nop)           |nop  |
tr-PC                       |fetch|read |exec |write|
tr-nPC                            |fetch|read |exec |write|

Restoring S during the read stage doesn't hurt any previously started
instructions, since they have all calculated any memory addresses by
the end of RETT's read stage.  The only problem I see which must be
dealt with is the possibility of a previous instruction trapping or of
the RETT instruction trapping.  Again there is a solution (trick)
which costs a little complexity of affecting the copy of S used for
fetching and deferring the setting of the "real" S until the write
stage of the RETT instruction where trapping is handled normally.

BTW, the Cypress 7C601 User's Guide shows both JMPL and RETT as two
cycle instructions (page 2-31).

Aren't pipelines fun!  Your mileage may vary.  A different pipeline
than the one I have imagined here will need different fixes and
patches to get a high performance solution, but the issues I have
considered here are usually what has to be considered.

	- Steve Krueger			krueger@micro.ti.com
	  SPARC Applications
	  Texas Instruments, Houston Texas USA