[comp.arch] Changing IEEE rounding modes on the fly

dik@cwi.nl (Dik T. Winter) (06/21/91)

The issue about rounding modes and whether changing them takes a long time or
not is obviously a bit more intricate than I thought at first.

In a previous posting I gave timings (from the manual) for the 88100, but
apparently the fp pipelines must be flushed before a change to the fp
control register is performed.  So I was wrong, and the 88100 does not
meet the factor 3 criterion.  (Alas, this is not documented, and the flush
must be programmed!)

What disturbs me is that this not only holds for the change of rounding mode,
but also for the way trapping on abnormal (inf, NaN) results is handled.  It
becomes for instance very cumbersome to say: "do not trap on overflow for
the next three instructions", although it can be very valid in the algorithm.
(Eg a place where you can expect an infinity because of division by zero,
but where you know that the next few instructions will handle it.)
Also the 80x87 does not handle this correct in my opinion.

Another issue is precision control as it is present on the 80x87 and the
6888[12].  (Here it is possible to indicate that a result from a single
operation should be rounded to a specific precision rather than the internal
double extended precision.)

There ought to be a distinction between status bits and control bits.  When
executing an instruction in a pipeline the control bits ought to be taken
along.  (If you look at the i860 you will see that the pipe already carries
a lot of information beyond the operands.  I think that carrying the control
information would not take excessive space.)  If the control information is
taken along in the pipe it is easy to change the control word without any
need to flush pipelines, as long as the change is synchronized with the
issue of FP instructions.  Also, changing the control information should
never issue a trap (unlike the 80x87 where unmasking a previous, but masked
exception results in a trap).  The reason is that if you unmask an exception
for only a few instructions, masking it afterwards should not result in a
trap for the intermediate operations!  You knew the exceptions might happen.

The next question is, should a masked exception be noted in the sticky
exception status bits?  IEEE tells me yes.  The reason is clear, you may
want to run a piece of code at full speed and check exceptions afterwards
(although I doubt that full speed can be reached if exceptions do occur).
So to satisfy IEEE needs, yes, exceptions should be noted.  (Anyhow, noted,
but not trapped exceptions, should be seen as an help in debugging, not as
an indication that something is wrong.)

Another question is how to do this if the FP unit is a co-processor (as is
effectively the case on the 88100).  Clearly the setting of the FP control
register ought to be a function of the co-processor, in that way it is
possible to insure that the setting of the register is not executed out
of line with other FP instructions.

An alternative is of course, as David Hough said, to have rounding mode as
part of the instruction (like Motorola does with precision control on the
68040).  Yes, it helps in this case, but not in the masking of exceptions
etc.  Having single instructions to change fields in a control register
would not be as fast as encoding it in the instruction, but would be more
helpful in more cases (eg exceptions).

A disadvantage of all this is of course that the FP pipes must be made
some bits wider, but I really do not think that is a problem.  We can
question whether it is possible to do it in an upward compatible way
in future implementations of current processors.  (SPARC tells us that
a change in the FP SR does not take effect until some cycles afterwards.)
I think it is possible.  Define a new instruction (executing in the FPU)
that sets/clears some designated fields of the SR/CR.  There is no
conflict in the specs.  But aren't we getting a bit CISCY now?  I think
not (oh and: this is all valid for CISC processors too).

It has been argued that you might be able to lump a whole lot of code
together such that you can reduce the number of settings of the rounding
mode.  E.g. the loop
	for(i=0;i<n;i++) a[i] = b[i] + c[i];
(where a, b and c are intervals and + is the interval addition) might
be split in two loops, one to calculate the lower bounds and one to
calculate the upper bounds.  This is true in a number of cases, but fails
on:
	for(i=0;i<n;i++) a[i] = (b[i] + c[i]) * d[i];
because for instance calculating the lower bound in a multiplication can
involve both the lower bounds and the upper bounds of the operands.

All this is of course moot if fp operations are not pipelined!
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

marvin@oakhill.sps.mot.com (Marvin Denman) (06/21/91)

In article <3751@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
>In a previous posting I gave timings (from the manual) for the 88100, but
>apparently the fp pipelines must be flushed before a change to the fp
>control register is performed.  So I was wrong, and the 88100 does not
>meet the factor 3 criterion.  (Alas, this is not documented, and the flush
>must be programmed!)

This is not true for the 88100.  The floating point control register can be
changed on the fly with no problems.  The only flush necessary is before
reading or writing the status register.  The fp control register will only
affect instructions issued after it is modified because its information
is carried through the pipeline with it.  I'm sure because I designed it that
way.  

-- 
Marvin Denman
Motorola 88000 Design
cs.utexas.edu!oakhill!marvin

dik@cwi.nl (Dik T. Winter) (06/22/91)

In article <1991Jun21.160606.4247@oakhill.sps.mot.com> marvin@bushwood.UUCP (Marvin Denman) writes:
 > In article <3751@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
 > >In a previous posting I gave timings (from the manual) for the 88100, but
 > >apparently the fp pipelines must be flushed before a change to the fp
 > >control register is performed.  So I was wrong, and the 88100 does not
 > >meet the factor 3 criterion.  (Alas, this is not documented, and the flush
 > >must be programmed!)
 > 
 > This is not true for the 88100.  The floating point control register can be
 > changed on the fly with no problems.  The only flush necessary is before
 > reading or writing the status register.  The fp control register will only
 > affect instructions issued after it is modified because its information
 > is carried through the pipeline with it.  I'm sure because I designed it that
 > way.  
 > 
Thank you for the clarification.  That is exactly the way I want it.
Surprisingly I got a message from somebody of the 88open consortium that
stated the following:
 > Yes, the "official" way to flush the pipe is to do something like
 > a "tb1 0,#r0,xxx" instruction.  The software standards at the ABI
 > (SVR4.0) level require this to be done for the fpsetround(),
 > fpsetsticky(), and fpsetmask() routines.  The equivalent Object
 > Compatibility Standard (OCS, i.e. SVR3+POSIX+ANSI C) routines
 > contain the note that
 >         "For consistent and portable application behavior, an
 >      application should ensure that all previously initiated
 >      floating point instructions have completed prior to invoking
 >      these functions."
Now, how did this come into the ABI/OCS?  Why the required flush for
fpsetround() etc?  Is 88open assuming that the behaviour can not be
carried on in future implementations?
--
dik t. winter, cwi, amsterdam, nederland
dik@cwi.nl

mac@gold.kpc.com (Mike McNamara) (06/25/91)

	In the Ardent titan, using the BIT fp chips, we supported on
the fly, fully pipelined rounding mode changes.  We did this by making
the change rounding mode action an instruction, rather than a register
write. This greatly facilitates interval arithmetic, intrinsics, and what
have you, as discussed to death in this thread.
	However, since the changing of rounding modes is a pipelined
instruction, rather than a register write, the operating system has to
start up the vector unit while in kernel mode, and insert an
instruction into the stream if it is necessary to change rounding
modes upon context switch. 
	Luckily A) you could read the current rounding mode from a
status register, and only insert this instruction if you needed to
switch, and B) most programs use round to nearest.

	-mac
--
+-----------+-----------------------------------------------------------------+
|mac@kpc.com| Increasing Software complexity lets us sell Mainframes as       |
|           | personal computers. Carry on, X windows/Postscript/emacs/CASE!! |
+-----------+-----------------------------------------------------------------+

hrubin@pop.stat.purdue.edu (Herman Rubin) (06/25/91)

In article <MAC.91Jun24115900@gold.kpc.com>, mac@gold.kpc.com (Mike McNamara) writes:
> 
> 	In the Ardent titan, using the BIT fp chips, we supported on
> the fly, fully pipelined rounding mode changes.  We did this by making
> the change rounding mode action an instruction, rather than a register
> write. This greatly facilitates interval arithmetic, intrinsics, and what
> have you, as discussed to death in this thread.
> 	However, since the changing of rounding modes is a pipelined
> instruction, rather than a register write, the operating system has to
> start up the vector unit while in kernel mode, and insert an
> instruction into the stream if it is necessary to change rounding
> modes upon context switch. 
> 	Luckily A) you could read the current rounding mode from a
> status register, and only insert this instruction if you needed to
> switch, and B) most programs use round to nearest.

So we need another instruction to do a trivial switch.  If rounding mode
is being changed frequently, as it must be on scalar interval arithmetic,
the "obvious" answer is to put it in the instruction, where it can be
decoded in parallel with the much slower execution.  Alternatively, have
a fair number of status registers, with the choice of the status register
part of the instruction.

No matter how much pipelining takes place, unless there is an unused slot
in a pipe running in parallel, it takes time.  But in that case, the 
efficiency is not being used, anyhow.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)