[net.arch] binary normalization

pdbain@wateng.UUCP (Peter Bain) (09/12/84)

The IEEE floating point standard also uses binary normalization.
One benefit of this is that, since you know the first bit of the mantissa
will be a 1, you can leave it out, giving an extra bit of precision.
The standard uses special exponent values to express exceptional conditions
such as zero, unnormalized numbers (this allows "soft underflow"),
not-a-number, overflow, etc. I compared 32 bit IEEE to 32 bit IBM, and found
that IEEE is better in the average and worst case, and as good in the
average case.
	-peter

jlg@lanl-a.UUCP (09/19/84)

<eaten>

It should be remembered that floating point numbers are a subset of the
rationals, not the reals.  Therefore, given the real line, only certain 
points are exactly representable in a give floating point number scheme.

If S(b,n,M) is the set of floating point numbers representable with n base
b digits between 1/M and M, then |S(b,n,M)| is the size of this set.  Note
that exponent range limits are excluded from this definition.  S can be 
regarded as a density function, i.e. the number of elements within a given
range.  A good definition of 'equivalent precision' for two floating point
systems would require that the two systems have the same density.  So 
define two systems as giving 'equivalent precision' if:

                         S(d,m,M)
                  lim   ---------- = 1
                 M->inf  S(b,m,M)

Matula [1] has shown that this is equivalent to:

         m = n log  b + log  [d(b-1)/b(d-1)] - log log  b
                  d        d                      d   d

Therefore, 24 bit binary is equivalent to about 6.27 hexadecimal digits.
Alternately, 24 bits = 7.49 decimal digits, and 6 digits hex = 7.16 decimal
digits.  IEEE clearly comes out superior to IBM (for example).  Restricting
the exponent range has no effect of this comparison, an application
which needs a wider range than provided by IEEE single would still be
better off with binary normalized arithmetic (use IEEE extended formats).

[1] A Formalization of Floating-Point Numeric Base Conversion, IEEE
    Transactions on Computers, vol. C-19, No. 8, August 1970, pp. 681-692.

jlg@lanl-a.UUCP (09/19/84)

<eat>

Sorry, the first equation in my last note should have been:

                        S(d,m,M)
                 lim  ------------ = 1
                M->inf  S(b,n,M)

With n digits base b and m digits base d.