pdbain@wateng.UUCP (Peter Bain) (09/12/84)
The IEEE floating point standard also uses binary normalization. One benefit of this is that, since you know the first bit of the mantissa will be a 1, you can leave it out, giving an extra bit of precision. The standard uses special exponent values to express exceptional conditions such as zero, unnormalized numbers (this allows "soft underflow"), not-a-number, overflow, etc. I compared 32 bit IEEE to 32 bit IBM, and found that IEEE is better in the average and worst case, and as good in the average case. -peter
jlg@lanl-a.UUCP (09/19/84)
<eaten> It should be remembered that floating point numbers are a subset of the rationals, not the reals. Therefore, given the real line, only certain points are exactly representable in a give floating point number scheme. If S(b,n,M) is the set of floating point numbers representable with n base b digits between 1/M and M, then |S(b,n,M)| is the size of this set. Note that exponent range limits are excluded from this definition. S can be regarded as a density function, i.e. the number of elements within a given range. A good definition of 'equivalent precision' for two floating point systems would require that the two systems have the same density. So define two systems as giving 'equivalent precision' if: S(d,m,M) lim ---------- = 1 M->inf S(b,m,M) Matula [1] has shown that this is equivalent to: m = n log b + log [d(b-1)/b(d-1)] - log log b d d d d Therefore, 24 bit binary is equivalent to about 6.27 hexadecimal digits. Alternately, 24 bits = 7.49 decimal digits, and 6 digits hex = 7.16 decimal digits. IEEE clearly comes out superior to IBM (for example). Restricting the exponent range has no effect of this comparison, an application which needs a wider range than provided by IEEE single would still be better off with binary normalized arithmetic (use IEEE extended formats). [1] A Formalization of Floating-Point Numeric Base Conversion, IEEE Transactions on Computers, vol. C-19, No. 8, August 1970, pp. 681-692.
jlg@lanl-a.UUCP (09/19/84)
<eat> Sorry, the first equation in my last note should have been: S(d,m,M) lim ------------ = 1 M->inf S(b,n,M) With n digits base b and m digits base d.