[comp.arch] argument reduction in elementary transcendental functions

dgh@validgh.com (David G. Hough on validgh) (05/16/91)

A couple of issues have been confused in recent comp.arch discussion.
One is whether argument reduction for elementary transcendental functions
involves simultaneous computation of a quotient (preferably in integer
format) and a remainder; another is whether instructions for that purpose
should be included in computer instruction set architectures.

Observe that the Motorola 68881 and 68882
implement exactly this operation in frem and fmod, 
but the quotient is probably never
used because these chips also implement fsin, fcos, and fetox.
The integer
quotient is returned as a signed bitfield in the fpsr status register.

The significance of the underlying mathematical operation can scarcely
be doubted.  As far as I know all software that attempts to compute
sin(x) for even modestly large x, or that attempts to compute exp(x)
over a full range,
with any pretense of competitive accuracy and speed, has to directly
or indirectly exploit formulas like

	sin(x) = sin( m * 2 * pi + n * pi/k + r)
	       = {+ or -}{sin or cos}(r)
and
	exp(x) = exp(n * loge(b) + r)
	       = b**n * exp(r)

where k is typically 2 and b is typically the base of floating-point arithmetic,
usually 2.  For sin,

	n = [x / (pi/2)] % 4
	r =  x % (pi/2) 

where the [] denote rounding to the nearest signed integer value,
so that in this case |r| <= pi/4.
n is used to select one of four cases in a switch statement, and so is
appropriately an integer.  For exp,

	n =  [x / loge(b)] 
	r =  x % loge(b)

Since n is going more or less directly to the exponent field of the result,
it is also appropriately an integer, and a fairly small one.  I'm not aware of
any common floating-point formats that need more than 16 signed bits of n -
larger x would overflow or underflow.

These operations of producing an integer quotient and corresponding
remainder are pretty fundamental, although not having been commonly implemented,
people have learned to obtain equivalent or sort of equivalent results by
indirect means.  If an integral approximation n to [x/y] can be obtained
somehow then a corresponding remainder r = x - y * n may be constructed
provided y * n can be computed exactly.

The computer architecture issues relate to language issues,
the latency of remainder operations,
and what precision to use for pi/2 and loge(b).

In principle compilers could recognize adjacent uses of x % y and x / y
and exploit any instructions available appropriately, although not many
might readily be adapted to recognized a situation where an 
integer quotient and a floating-point remainder was desired.  ANSI-C
has some quotient-remainder functions but they are not quite right
for this application.

Since remainder operations for large magnitude x may take hundreds or
thousands of cycles, interrupt latency may suffer or the remainder hardware
must be designed to be interruptible in mid-instruction.  Neither of these
ideas sits well in the RISC universe.  However it is perfectly feasible to
define a partial remainder, as in the original 8087, which computes for some
maximum number of cycles and indicates if the remainder operation is still
incomplete.  I had specified such a partial remainder in early drafts of
the SPARC architecture but later removed it after it was apparent that it
would not be implemented any time soon, and because of the second issue:

What precision for pi/2 and loge(b) is appropriate?  The 68881 and 68882
fsin uses an approximation to 66 significant bits to compute extended-precision
results with 64 significant bits.  That defers but does not eliminate the
onset of gradual loss of significant bits in computed sin(x) for x close
to even multiples of pi, as described in the March 1981 IEEE Computer.
More lately I have subscribed to the view that the standard elementary
trigonometric, logarithmic, and exponential functions should suffer no greater
uncertainty due to argument reduction than due to subsequent approximation on a
reduced range, which means that arguments should be reduced with values
of pi/2 and loge(b) which are as accurate as necessary, depending on the
size of the maximum exponent, which implies a somewhat different approach.
I have sometimes advocated correctly-rounded transcendental functions, which
is an even more demanding requirement, requiring a reduced result and its
uncertainty as input to the calculation on a reduced range.

The book by Cody and Waite summarizes the best thinking of the mid-1970's
when it was written, just prior to the advent of IEEE arithmetic.
The ongoing series of papers by Cody's colleague Peter Tang, published in
ACM TOMS, are the best examples of current thinking about elementary
transcendental functions which are almost correctly rounded, but for
efficiency's sake, not quite.
-- 

David Hough

dgh@validgh.com		uunet!validgh!dgh	na.hough@na-net.ornl.gov