[comp.sys.m88k] Is the FPSR interlocked with the FPU pipe?

fsset@bach.lerc.nasa.gov (Scott E. Townsend) (04/25/91)

This sounds too strange to be true, but is it possible for the FPSR to
return 'too fresh' data?  Or put another way, why should the following two
code fragments behave differently?

#1
	fdiv.ddd r8,r2,r4
	fldcr	 r12,fcr62	; fcr62 == FPSR
	bb1	 0,r12,@L21	; AFINX bit

#2
        fdiv.ddd r8,r2,r4
	tb1	 0,r0,0		; trap not taken, but system 'synced'
        fldcr    r12,fcr62
        bb1      0,r12,@L21

This code is buried a bit, so I don't have the exact different results,
but the behaviour _is_ different.  Is it possible the fldcr gets whatever
is 'current' rather than the result of the fdiv? (which will take a while)

Please show me I'm sniffing glue!

kenton@abyss.zk3.dec.com (Jeff Kenton OSG/UEG) (04/25/91)

In article <1991Apr24.200412.7483@eagle.lerc.nasa.gov>,
fsset@bach.lerc.nasa.gov (Scott E. Townsend) writes:
|> This sounds too strange to be true, but is it possible for the FPSR to
|> return 'too fresh' data?  Or put another way, why should the following two
|> code fragments behave differently?
|> 
|> #1
|> 	fdiv.ddd r8,r2,r4
|> 	fldcr	 r12,fcr62	; fcr62 == FPSR
|> 	bb1	 0,r12,@L21	; AFINX bit
|> 
|> #2
|>         fdiv.ddd r8,r2,r4
|> 	tb1	 0,r0,0		; trap not taken, but system 'synced'
|>         fldcr    r12,fcr62
|>         bb1      0,r12,@L21
|> 
|> This code is buried a bit, so I don't have the exact different results,
|> but the behaviour _is_ different.  Is it possible the fldcr gets whatever
|> is 'current' rather than the result of the fdiv? (which will take a while)
|> 

Yes -- that is exactly what's happening.  As a matter of fact, most of the
status bits in the FPSR are set by exception handlers, not hardware.  The
imprecise exceptions (AFINX is one) are not detected before the next cycle,
which is what your example #1 is expecting.

-----------------------------------------------------------------------------
==	jeff kenton		Consulting at kenton@decvax.dec.com        ==
==	(617) 894-4508			(603) 881-0011			   ==
-----------------------------------------------------------------------------

hamilton@siberia.rtp.dg.com (Eric Hamilton) (04/25/91)

In article <1991Apr24.200412.7483@eagle.lerc.nasa.gov>, fsset@bach.lerc.nasa.gov (Scott E. Townsend) writes:
|> This sounds too strange to be true, but is it possible for the FPSR to
|> return 'too fresh' data?  Or put another way, why should the following two
|> code fragments behave differently?
|> 
|> #1
|> 	fdiv.ddd r8,r2,r4
|> 	fldcr	 r12,fcr62	; fcr62 == FPSR
|> 	bb1	 0,r12,@L21	; AFINX bit
|> 
|> #2
|>         fdiv.ddd r8,r2,r4
|> 	tb1	 0,r0,0		; trap not taken, but system 'synced'
|>         fldcr    r12,fcr62
|>         bb1      0,r12,@L21
|> 
|> This code is buried a bit, so I don't have the exact different results,
|> but the behaviour _is_ different.  Is it possible the fldcr gets whatever
|> is 'current' rather than the result of the fdiv? (which will take a while)

Yes.

These two code fragments behave differently, and for exactly the
reason that you suspect.  The fdiv instruction has started but not
completed.  The floating point imprecise exceptions (overflow, underflow,
and imprecise) are signalled when the operation is complete, so code
fragment #1 is reading the FPSR prematurely.

The trap-not-taken drains teh pipelines (at the cost of waiting
sixty-odd cycles for the fdiv to complete) so that code fragment
#2 will show the effect of any imprecise exceptions provoked by the
fdiv.

Note that any attempt to use r8 or r9 will have the same effect of
waiting for the fdiv to complete - there will be a scoreboard hold.
The oddity is that the FPSR, which is also a "destination" register
for the fdiv is not interlocked with the floating point pipe.

marvin@oakhill.sps.mot.com (Marvin Denman) (04/26/91)

In article <1991Apr24.200412.7483@eagle.lerc.nasa.gov> fsset@bach.lerc.nasa.gov (Scott E. Townsend) writes:
>This sounds too strange to be true, but is it possible for the FPSR to
>return 'too fresh' data?  Or put another way, why should the following two
>code fragments behave differently?
>
>#1
>	fdiv.ddd r8,r2,r4
>	fldcr	 r12,fcr62	; fcr62 == FPSR
>	bb1	 0,r12,@L21	; AFINX bit
>
>#2
>        fdiv.ddd r8,r2,r4
>	tb1	 0,r0,0		; trap not taken, but system 'synced'
>        fldcr    r12,fcr62
>        bb1      0,r12,@L21
>

On the 88100 there is no builtin interlock between floating point instructions
and reads or writes to the FPSR.  The "safe" way of modifying the FPSR is
to sync the processor just as your example did.  Future implementations of
the 88000 architecture may or may not need this synchronization, but it should
never cause you any problems other than an additional instruction.  Note that
if this was an fstcr to the FPSR, your value might be overwritten when the
fdiv completes.   I looked briefly through the 88100 User's Manual and I could
not find this behavior documented either, but I will continue to look.  

-- 
Marvin Denman
Motorola 88000 Design
cs.utexas.edu!oakhill!marvin