[net.unix-wizards] float/double in C

dmr (09/09/82)

Several people have correctly quoted the manual as calling for evaluation
of expressions on (single-precision) floats in double precision.  The
rule was made for 3 reasons.

1) To make certain that floating-point function arguments and return values
   were always double (thus avoiding multiple library routines and constant
   need for casts.)

2) Because the PDP-11 makes it horribly difficult to mix modes otherwise
   (yes, I admit it).

3) Because it is numerically more desirable to evaluate single-precision
   expressions in double precision, then truncate the result.

These are in order of importance.  Now, the people who actually use C for
numerical work seem to feel that on the VAX, at least, they would gladly
give up reason 3 in favor of increasing the speed of evaluation of expressions
involving single-precision numbers.  I am inclined to look kindly on this
reasoning, providing that the first reason above (which is paramount) is
observed.  That is, if one wants to compile
	double a; float b,c;
	a = b+c;
into
	addf3	b,c,r0
	cvtfd	r0,a
one has committed a very venial sin.  However, the sin would be mortal
if one got the wrong answer from sin(b) where b was declared "float."

The real problem with the VAX compiler that was originally complained
about is that through a bug it had the worst of both worlds.  It did
the arithmetic in single precision and also generated useless conversion
instructions that made the whole calculation slower than if it had been
double in the first place.  I understand that this bug has been fixed
in recent versions.

		Dennis Ritchie

P.S.  I am quite aware that reasons 1,2,3 above can also be adduced
in a discussion of shorts vs. longs.  That is a whole other story.

P.P.S.  The sinning pun above actually was unintended-- I noticed it only
during proofreading.

swatt (09/09/82)

Note that the float==>double (always) rule, in its guise of char==>int
(always), hurts the 8-bit micros as well (remember them?).  The 8080 is
a particularaly nasty example.  Someone at John Fluke in Seattle took
and modified the Whitesmiths 8080 compiler to treat type "char" as a
real data type such that operations only involving chars were NOT
converted to int.  This of course had the effect of adding an extra
width of argument passed on the stack, but the gains in efficiency more
than made up for the nuisance.  I think this is what prompted Bill
Plaugher's comment that "8-bit micros are below C level".

On the VAX, you have short int==>int (always) as well.

I have often thought a reasonable rule could be:

	In mixed type expressions, all sub-expressions are converted to
	the type with the greatest precision before the expression is
	evaluated.  If all sub-expressions are of the same type, the
	evaluation is carried out without conversions, if convenient
	for the host machine.  In no case is the precision of the
	result any LESS than the precision of any of the
	sub-expressions.

Thus in the examples Dennis Ritchie mentioned:

	double a; float b,c;

	a = b + c;

The rule would force conversion of "b" and "c" to double before the
addition and assignment.  However if "a" were also of type float, no
conversion would take place.

The difficulties with pdp-11 could be handled under the
"if convenient ..." clause.

The problems with library routines, I submit, are NOT best handled by
limiting the number of types functions can return, but by knowing
exactly what each function actually does return.  I think the rules
for the linker do not imply a reference if one just declares:

	double	sin();

but doesn't actually use the function (I may be wrong).  If this is
the case, good practice should encourage people to include a file
declaring the types of the functions they are getting from the library.
Someone (I think Mark Horton) suggested having function types default
to "void" instead of "int".

Note that you HAVE to correctly declare pointers to floating-point
types because of the issue of correct storage width.  I think in some
machines the address of a float type is not the same as the address of
a double type, so that if you pass the address of a double type and
call it the address of a float type, you will get incorrect results
(if this isn't true now, you can bet it will be sooner or later).

Dennis also mentioned a bug fix involving unnecessary double<==>float
conversions; I think a similar bug also degrades int<==>unsigned short
operation on the VAX -- look at driver code to manipulate UNIBUS
registers.  Is this bug easily identified and fixed?

	- Alan S. Watt