[net.micro.68k] MC68881 FPU times/single vs. double precision

guy@rlgvax.UUCP (Guy Harris) (12/13/84)
> > Discussion of the 68881 doing single-precision and double-precision
> > arithmetic at the same speed, and how this could mean that C's
> > current rules for floating-point computation might not be so bad...
> 
> Besides, it's rather easy to show that DMR (I assume) wasn't
> thinking of the 68881 when he decided to restrict evaluation to
> double [:-)].  He MAY, of course, have assumed -- as nearly all of us
> have finally come to realize -- that doing things in hardware gets
> easier as time goes by, but extending that general observation to a
> prediction of a monolithic FPU with an 80-bit path would surely have
> required a pretty phenomenal crystal ball.

The machine where C originally appeared was the PDP-11/45; it had a
floating point unit which didn't do double-precision multiplies or
divides as fast as single-precision ones, although it could do
single-precision and double-precision adds and subtracts at the same
speed.  None of this should be surprising; both the FP11-A and the MC68881
have a data path wide enough for double-precision arithmetic, which
means adds and subtracts should take the same time.  However, unless it
uses some clever parallel multiplier, multiplies and divides can be made
to be faster in single-precision because they have to examine/generate fewer
bits.

In addition, single-precision loads and stores were faster than double-precision
ones, because it had to do half as many memory fetches (16-bit memory
fetches, at that).  So single-precision and double-precision arithmetic
were definitely not equally fast at the time.

Now, looking at the MC68881 times:

INSTRUCTION       Reg-to-Reg   ----------Memory to Reg---------------
			       Single      Double     Extended
FMOVE (in)            1.5       3.0          3.4        3.3

Note that double precision operations are slower, but only slightly.
The operation may not be dominated by memory fetch time, which might
explain this, but the timings didn't give any indication of how fast
the 68020 or 68881 could fetch data from the memory it was using in
these tests.

FADD/FSUB             2.8       4.3          4.6        4.5

The same extra .3 microseconds for double-precision operations
shouldn't be too surprising, if it's assumed that the extra .4 microseconds
for the double-precision FMOVE is due to the time required to fetch the
extra 32 bits.

FSGLMUL               3.1       4.6          ---        ---
FSGLDIV               3.8       5.3          ---        ---
FMUL                  4.0       5.5          5.8        5.7
FDIV                  5.9       7.4          7.7        7.6

I'm assuming FSGL{MUL,DIV} are single-precision multiply/divide
instructions and F{MUL,DIV} are their double-precision equivalents
(i.e, FSGL... produces single-precision results and F... produce
double-precision results.  Note that there *is* a significant difference
in these timings.

The transcendental functions didn't seem to have single-precision variants,
so no comparison is available.

So the conclusion that "there is essentially NO PENALTY IN USING DOUBLE
vs SINGLE PRECISION in the basic floating point operations" isn't really
justified here.  Double precision operations that require memory references
are slower, because they require more fetches (or stores), and double precision
operations that require serial processing of the bits of the mantissa
(like multiplies without parallel multipliers or divides) are slower because
there are more bits to process.  This is the same situation as prevailed
in the days of the FP11-A (one of the first coprocessors; the whole
FPU coprocessor architecture used by the chips out there seems to be
the FP11-A idea on a chip), and it prevails for the same reason.

The actual reasons for doing all computations in double precision, if I
remember Dennis' comments correctly, were twofold: 1) it gave the computation
better precision and 2) it was a royal pain to generate code for the DEC
FP11 instruction set to do mixed-precision calculations, because the precision
of an operation wasn't indicated by the opcode but was indicated by a mode
bit which could be set to "single" or "double" by special instructions.
It *can* be done - DEC's Fortran IV-Plus did so; from a look at the code
it generated (or, at least, that the version I worked with generated), it
made the simple assumption that the mode was unknown at the entry and exit
points of a basic block, so that it only had to keep track of the mode within
the block.  I presume that floating-point computation, or speed thereof,
was not considered sufficiently important in the work being done in the
group where the compiler was done to make it worth doing it "right".  I
can't say I disagree with that conclusion, although if one plans to use
C for heavy numerical work the conclusion definitely changes - in fact, the
conclusion that the "f77" compiler doesn't need to do real optimization
changes.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy