[comp.lang.c] Need info on IEEE quad format

shankar@hpclscu.HP.COM (Shankar Unni) (09/09/88)

I need some info on IEEE floating point representation limits to construct
a <float.h> file for ANSI C. I already have the info for single and double
floats:

    #define	FLT_RADIX	2
    #define	FLT_ROUNDS	1   /* sort of: *.5 rounds -> nearest EVEN */

    #define	FLT_MANT_DIG	24
    #define	FLT_EPSILON	1.19209290E-07
    #define	FLT_DIG		6
    #define	FLT_MIN_EXP	-126
    #define	FLT_MIN		1.17549435E-38
    #define	FLT_MIN_10_EXP	-37
    #define	FLT_MAX_EXP	127
    #define	FLT_MAX		3.40282347E+38
    #define	FLT_MAX_10_EXP	38

    #define	DBL_MANT_DIG	53
    #define	DBL_EPSILON	2.2204460492503131E-16
    #define	DBL_DIG		15
    #define	DBL_MIN_EXP	-1022
    #define	DBL_MIN		2.225073858507201E-308
    #define	DBL_MIN_10_EXP	-307
    #define	DBL_MAX_EXP	1023
    #define	DBL_MAX		1.797693134862315e+308
    #define	DBL_MAX_10_EXP	308

The information I need is (for quad-precision (128-bit) floats):

    #define	LDBL_MANT_DIG	113
    #define	LDBL_EPSILON	??	/* 1.0 + EPSILON != 1.0 */
    #define	LDBL_DIG	??	/* decimal digits of precision */
    #define	LDBL_MIN_EXP	-16382
    #define	LDBL_MIN	??	/* smallest *normalized* quad */
    #define	LDBL_MIN_10_EXP ??	/* about -4930, no? */
    #define	LDBL_MAX_EXP	16383
    #define	LDBL_MAX	??	/* largest quad */
    #define	LDBL_MAX_10_EXP ??	/* ~~ 4931? */

The magnitude of the smallest de-normalized quad would also be useful...

Could some kind soul *VERIFY* the above figures that I've filled in, and
fill in those that I haven't?
--
Many many thanx in advance,
-----------------------------------------------------------------------------
Shankar Unni				     allegra)
Mail: shankar@hpda.HP.COM		UUCP: ucbvax)!hplabs!hpda!shankar
AT&T: (408) 447-5797			      decwrl)
-----------------------------------------------------------------------------

henry@utzoo.uucp (Henry Spencer) (09/11/88)

In article <660016@hpclscu.HP.COM> shankar@hpclscu.HP.COM (Shankar Unni) writes:
>I need some info on IEEE floating point representation limits to construct
>a <float.h> file for ANSI C. I already have the info for single and double
>floats...
>The information I need is (for quad-precision (128-bit) floats)...

Uh, *what* quad-precision floats?  If you are thinking of IEEE 754, which
is what people usually mean when they say "IEEE floating point", it defines
single and double formats, and puts some minimum requirements on an otherwise
implementation-specific "extended" format.  Extended format does not have to
be 128 bits; for example, on the Motorola floating-point chips (and, I think,
on the Intel ones too) it is 80 bits.  The current ANSI C draft gives a full
list of the values for the contents of <float.h> for IEEE floating point.
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

turner@sdti.UUCP (Prescott K. Turner) (09/19/88)

In article <660016@hpclscu.HP.COM>, shankar@hpclscu.HP.COM (Shankar Unni)
writes:
>I need some info on IEEE floating point representation limits to construct
>a <float.h> file for ANSI C. I already have the info for single and double
>floats:
>...
Here are some improvements to your single and double values:
     #define	FLT_ROUNDS	1 /* is consistent with tie-breaking */
                                  /* to nearest even significand */
    
     #define	DBL_MIN_EXP	-1021 /* Because C uses an different */
     #define	DBL_MAX_EXP	1024  /* (inferior) model for floating */ 
     #define	FLT_MIN_EXP	-125  /* point numbers from IEEE 754, its */
     #define	FLT_MAX_EXP	128   /* MIN_EXP and MAX_EXP values are */
                                      /* different. */

     #define	DBL_MIN		2.2250738585072014e-308 /* more accurate */
     #define	DBL_MAX		1.7976931348623157e+308 /* than the latest */
                                                        /* draft C standard */

>The information I need is (for quad-precision (128-bit) floats):
The draft C standard does not provide the figures for IEEE quad-precision
because the IEEE 754 standard prescribes only lower limits for range and
precision of a 'double extended' format.  I will attempt to fill in your
table, based on the quad format which appeared in an early draft of the IEEE
standard, and which is supported by Intel coprocessors. 

     #define	LDBL_MANT_DIG	112  /* no hidden bit */
     #define	LDBL_EPSILON	3.851859888774471706111955885169855E-34L
     #define	LDBL_DIG	33
     #define	LDBL_MIN_EXP	-16381
     #define	LDBL_MIN        3.362103143112093506262677817321753E-4932L
     #define	LDBL_MIN_10_EXP -4931
     #define	LDBL_MAX_EXP	16384
     #define	LDBL_MAX	1.1897314953572317650857593266280069E+4932L
     #define	LDBL_MAX_10_EXP 4932

>The magnitude of the smallest de-normalized quad would also be useful...
     #define    LDBL_DENORM_MIN 1E-4965L
No need for lots of digits here, because the smallest denormalized number has
only 1 bit of precision.

Caveat: The IEEE standard has strict requirements on decimal-to-binary
conversion for single and double, but even there it permits a little slack in
converting the _MAX and _MIN constants.  You're lucky if you have a
decimal-to-binary conversion routine which will convert the above
representations of LDBL_MAX and LDBL_MIN to the appropriate binary values.
You could even get overflow.  If there is a problem, it's more important that
the constants convert correctly than that they themselves be accurate.

Note that the C standard permits the macro names to be defined as expressions.
Here's an idea for what might work:
     #define    FLT_MAX         (ldexp(1-6E-8, FLT_MAX_EXP))
     #define    FLT_MIN         (ldexp(0.5, FLT_MIN_EXP))
     #define    DBL_MAX         (ldexp(1-1E-16, DBL_MAX_EXP))
     #define    DBL_MIN         (ldexp(0.5, DBL_MIN_EXP))
     #define    LDBL_MAX        (ldexpl(1-2E-34L,LDBL_MAX_EXP))
     #define    LDBL_MIN        (ldexpl(0.5L, LDBL_MIN_EXP))
--
Prescott K. Turner, Jr.
Software Development Technologies, Inc.
375 Dutton Rd., Sudbury, MA 01776 USA        (508) 443-5779
UUCP:...genrad!mrst!sdti!turner