guy@rlgvax.UUCP (Guy Harris) (12/13/84)
> > Discussion of the 68881 doing single-precision and double-precision > > arithmetic at the same speed, and how this could mean that C's > > current rules for floating-point computation might not be so bad... > > Besides, it's rather easy to show that DMR (I assume) wasn't > thinking of the 68881 when he decided to restrict evaluation to > double [:-)]. He MAY, of course, have assumed -- as nearly all of us > have finally come to realize -- that doing things in hardware gets > easier as time goes by, but extending that general observation to a > prediction of a monolithic FPU with an 80-bit path would surely have > required a pretty phenomenal crystal ball. The machine where C originally appeared was the PDP-11/45; it had a floating point unit which didn't do double-precision multiplies or divides as fast as single-precision ones, although it could do single-precision and double-precision adds and subtracts at the same speed. None of this should be surprising; both the FP11-A and the MC68881 have a data path wide enough for double-precision arithmetic, which means adds and subtracts should take the same time. However, unless it uses some clever parallel multiplier, multiplies and divides can be made to be faster in single-precision because they have to examine/generate fewer bits. In addition, single-precision loads and stores were faster than double-precision ones, because it had to do half as many memory fetches (16-bit memory fetches, at that). So single-precision and double-precision arithmetic were definitely not equally fast at the time. Now, looking at the MC68881 times: INSTRUCTION Reg-to-Reg ----------Memory to Reg--------------- Single Double Extended FMOVE (in) 1.5 3.0 3.4 3.3 Note that double precision operations are slower, but only slightly. The operation may not be dominated by memory fetch time, which might explain this, but the timings didn't give any indication of how fast the 68020 or 68881 could fetch data from the memory it was using in these tests. FADD/FSUB 2.8 4.3 4.6 4.5 The same extra .3 microseconds for double-precision operations shouldn't be too surprising, if it's assumed that the extra .4 microseconds for the double-precision FMOVE is due to the time required to fetch the extra 32 bits. FSGLMUL 3.1 4.6 --- --- FSGLDIV 3.8 5.3 --- --- FMUL 4.0 5.5 5.8 5.7 FDIV 5.9 7.4 7.7 7.6 I'm assuming FSGL{MUL,DIV} are single-precision multiply/divide instructions and F{MUL,DIV} are their double-precision equivalents (i.e, FSGL... produces single-precision results and F... produce double-precision results. Note that there *is* a significant difference in these timings. The transcendental functions didn't seem to have single-precision variants, so no comparison is available. So the conclusion that "there is essentially NO PENALTY IN USING DOUBLE vs SINGLE PRECISION in the basic floating point operations" isn't really justified here. Double precision operations that require memory references are slower, because they require more fetches (or stores), and double precision operations that require serial processing of the bits of the mantissa (like multiplies without parallel multipliers or divides) are slower because there are more bits to process. This is the same situation as prevailed in the days of the FP11-A (one of the first coprocessors; the whole FPU coprocessor architecture used by the chips out there seems to be the FP11-A idea on a chip), and it prevails for the same reason. The actual reasons for doing all computations in double precision, if I remember Dennis' comments correctly, were twofold: 1) it gave the computation better precision and 2) it was a royal pain to generate code for the DEC FP11 instruction set to do mixed-precision calculations, because the precision of an operation wasn't indicated by the opcode but was indicated by a mode bit which could be set to "single" or "double" by special instructions. It *can* be done - DEC's Fortran IV-Plus did so; from a look at the code it generated (or, at least, that the version I worked with generated), it made the simple assumption that the mode was unknown at the entry and exit points of a basic block, so that it only had to keep track of the mode within the block. I presume that floating-point computation, or speed thereof, was not considered sufficiently important in the work being done in the group where the compiler was done to make it worth doing it "right". I can't say I disagree with that conclusion, although if one plans to use C for heavy numerical work the conclusion definitely changes - in fact, the conclusion that the "f77" compiler doesn't need to do real optimization changes. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy