shankar@hpclscu.HP.COM (Shankar Unni) (09/09/88)
I need some info on IEEE floating point representation limits to construct a <float.h> file for ANSI C. I already have the info for single and double floats: #define FLT_RADIX 2 #define FLT_ROUNDS 1 /* sort of: *.5 rounds -> nearest EVEN */ #define FLT_MANT_DIG 24 #define FLT_EPSILON 1.19209290E-07 #define FLT_DIG 6 #define FLT_MIN_EXP -126 #define FLT_MIN 1.17549435E-38 #define FLT_MIN_10_EXP -37 #define FLT_MAX_EXP 127 #define FLT_MAX 3.40282347E+38 #define FLT_MAX_10_EXP 38 #define DBL_MANT_DIG 53 #define DBL_EPSILON 2.2204460492503131E-16 #define DBL_DIG 15 #define DBL_MIN_EXP -1022 #define DBL_MIN 2.225073858507201E-308 #define DBL_MIN_10_EXP -307 #define DBL_MAX_EXP 1023 #define DBL_MAX 1.797693134862315e+308 #define DBL_MAX_10_EXP 308 The information I need is (for quad-precision (128-bit) floats): #define LDBL_MANT_DIG 113 #define LDBL_EPSILON ?? /* 1.0 + EPSILON != 1.0 */ #define LDBL_DIG ?? /* decimal digits of precision */ #define LDBL_MIN_EXP -16382 #define LDBL_MIN ?? /* smallest *normalized* quad */ #define LDBL_MIN_10_EXP ?? /* about -4930, no? */ #define LDBL_MAX_EXP 16383 #define LDBL_MAX ?? /* largest quad */ #define LDBL_MAX_10_EXP ?? /* ~~ 4931? */ The magnitude of the smallest de-normalized quad would also be useful... Could some kind soul *VERIFY* the above figures that I've filled in, and fill in those that I haven't? -- Many many thanx in advance, ----------------------------------------------------------------------------- Shankar Unni allegra) Mail: shankar@hpda.HP.COM UUCP: ucbvax)!hplabs!hpda!shankar AT&T: (408) 447-5797 decwrl) -----------------------------------------------------------------------------
henry@utzoo.uucp (Henry Spencer) (09/11/88)
In article <660016@hpclscu.HP.COM> shankar@hpclscu.HP.COM (Shankar Unni) writes: >I need some info on IEEE floating point representation limits to construct >a <float.h> file for ANSI C. I already have the info for single and double >floats... >The information I need is (for quad-precision (128-bit) floats)... Uh, *what* quad-precision floats? If you are thinking of IEEE 754, which is what people usually mean when they say "IEEE floating point", it defines single and double formats, and puts some minimum requirements on an otherwise implementation-specific "extended" format. Extended format does not have to be 128 bits; for example, on the Motorola floating-point chips (and, I think, on the Intel ones too) it is 80 bits. The current ANSI C draft gives a full list of the values for the contents of <float.h> for IEEE floating point. -- NASA is into artificial | Henry Spencer at U of Toronto Zoology stupidity. - Jerry Pournelle | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
turner@sdti.UUCP (Prescott K. Turner) (09/19/88)
In article <660016@hpclscu.HP.COM>, shankar@hpclscu.HP.COM (Shankar Unni) writes: >I need some info on IEEE floating point representation limits to construct >a <float.h> file for ANSI C. I already have the info for single and double >floats: >... Here are some improvements to your single and double values: #define FLT_ROUNDS 1 /* is consistent with tie-breaking */ /* to nearest even significand */ #define DBL_MIN_EXP -1021 /* Because C uses an different */ #define DBL_MAX_EXP 1024 /* (inferior) model for floating */ #define FLT_MIN_EXP -125 /* point numbers from IEEE 754, its */ #define FLT_MAX_EXP 128 /* MIN_EXP and MAX_EXP values are */ /* different. */ #define DBL_MIN 2.2250738585072014e-308 /* more accurate */ #define DBL_MAX 1.7976931348623157e+308 /* than the latest */ /* draft C standard */ >The information I need is (for quad-precision (128-bit) floats): The draft C standard does not provide the figures for IEEE quad-precision because the IEEE 754 standard prescribes only lower limits for range and precision of a 'double extended' format. I will attempt to fill in your table, based on the quad format which appeared in an early draft of the IEEE standard, and which is supported by Intel coprocessors. #define LDBL_MANT_DIG 112 /* no hidden bit */ #define LDBL_EPSILON 3.851859888774471706111955885169855E-34L #define LDBL_DIG 33 #define LDBL_MIN_EXP -16381 #define LDBL_MIN 3.362103143112093506262677817321753E-4932L #define LDBL_MIN_10_EXP -4931 #define LDBL_MAX_EXP 16384 #define LDBL_MAX 1.1897314953572317650857593266280069E+4932L #define LDBL_MAX_10_EXP 4932 >The magnitude of the smallest de-normalized quad would also be useful... #define LDBL_DENORM_MIN 1E-4965L No need for lots of digits here, because the smallest denormalized number has only 1 bit of precision. Caveat: The IEEE standard has strict requirements on decimal-to-binary conversion for single and double, but even there it permits a little slack in converting the _MAX and _MIN constants. You're lucky if you have a decimal-to-binary conversion routine which will convert the above representations of LDBL_MAX and LDBL_MIN to the appropriate binary values. You could even get overflow. If there is a problem, it's more important that the constants convert correctly than that they themselves be accurate. Note that the C standard permits the macro names to be defined as expressions. Here's an idea for what might work: #define FLT_MAX (ldexp(1-6E-8, FLT_MAX_EXP)) #define FLT_MIN (ldexp(0.5, FLT_MIN_EXP)) #define DBL_MAX (ldexp(1-1E-16, DBL_MAX_EXP)) #define DBL_MIN (ldexp(0.5, DBL_MIN_EXP)) #define LDBL_MAX (ldexpl(1-2E-34L,LDBL_MAX_EXP)) #define LDBL_MIN (ldexpl(0.5L, LDBL_MIN_EXP)) -- Prescott K. Turner, Jr. Software Development Technologies, Inc. 375 Dutton Rd., Sudbury, MA 01776 USA (508) 443-5779 UUCP:...genrad!mrst!sdti!turner