[comp.arch] floating point formats -- usage of small floats

wilson@carcoar.Stanford.EDU (Paul Wilson) (03/05/90)

I'm looking for info that would be useful in implementing short
floating point numbers for Lisp/Smalltalk/etc.

The basic idea is that I want short (1-word) floating point numbers to
fit in one 32-bit words, *with* a runtime tag included.  My plan is
to store the number shifted left a few bits, at a tag in the lower bits.
Having the number fit in a word is a big advantage, especially if you've
got a generational garbage collector.  (They have to be careful about
pointer creation, in ways that can needlessly slow down programs that
allocate a lot of floats.)

Of course, this means I have to sacrifice a few bits of a standard 32-bit
float to make it fit.

I figure that I can just lop a couple of bits of of the exponent part, and
leave the fraction part alone.  That way, I sacrifice range of floats
but not precision.  If a float result goes out of that range, I'll make
a full-sized floating point object on the heap, and use a tagged
pointer to it.  This is analogous to the "fixnum" (immediate short int) /
"bignum" (long, heap-allocated) strategy used for integers.

Now the question:

How far can I carry this?  Can I take five bits out of the exponent
and still express most numbers most people would use?  (How many bits
do IEEE 32-bit floats use for the expt in the first place?)  If not
5, can I do 4, 3, or 2?

Has anybody done any studies of the distribution of floating point values
for real programs?  My guess is that most programs don't generate *any*
floating point values outside of a fairly narrow range, so that most 
programs would not require *any* big floats.   (I'm not looking for Lisp
program data here -- I wouldn't want to assume that people only want
to run the kinds of programs traditionally written in Lisp.  C and Fortran
data would be fine.)

And if I've made any stupid assumptions, such that this is a bad idea
anyway, please gently clue me in.  (I know close to nothing about
computer arithmetic.)


Thanks,

   Paul


Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680 
Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680

jkenton@pinocchio.encore.com (Jeff Kenton) (03/05/90)

From article <1990Mar5.031003.12107@Neon.Stanford.EDU>, by wilson@carcoar.Stanford.EDU (Paul Wilson):
> I'm looking for info that would be useful in implementing short
> floating point numbers for Lisp/Smalltalk/etc.
> 
> The basic idea is that I want short (1-word) floating point numbers to
> fit in one 32-bit words, *with* a runtime tag included.  My plan is
> . . .
> 
> Of course, this means I have to sacrifice a few bits of a standard 32-bit
> float to make it fit.
> 

Standard 32-bit floats have 1 sign bit, 8 exponent bits and 23 (+1) bits of
mantissa.  This gives an exponent range of 10**-38 to 10**+38, and a
precision of about 7 decimal digits.

For each bit of exponent you drop you cut the range in half -- 4 bit exponents
only leave a range of .004 to 511.  Each four bits of mantissa you drop will
lose you slightly more than 1 decimal digit.

Furthermore, anything you do will mean that the FP hardware won't work with
your number format.  And, no, you couldn't fake it, truncate it or trick it
to get "approximately right" results from a chain of calculations "most of
the time".  Your users would shoot you.



- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      jeff kenton  ---	temporarily at jkenton@pinocchio.encore.com	 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

aglew@oberon.csg.uiuc.edu (Andy Glew) (03/06/90)

I don't have any references on distribution of floating point numbers
(and hope you'll be able to point us to them).

However, a few thoughts on "smaller-than-word" floating point:

- My first reaction was that you should use one of the 20 or 24 bit
floating point formats that are used by a number of DSPs - say the
AT&T chip.

- Second, it occurred to me that you could sacrifice the low order bit(s)
of the mantissa, in much the same way that many LISP systems use the
least significant bit of the integer format as a tag. If you used LSB=0
as the tag for your <32 bit tagged FP value, you wouldn't need any masking
operations on the operands before dispatching the FP operation.  However,
you would need a cleanup afterwards, probably implementing correct rounding
in software.  Trouble is, the LSB=0 tag would conflict with the most convenient
tag value for integers.  A little bit of masking for <32 bit FP operations
might be acceptable if it permitted the dynamically more frequent integer
operations to have no tag manipulation overhead. 
    You could still use the native FP hardware.  A machine like the 88K,
where FP and integers share the same register file, might be preferrable for
you.  A seperate FP register set will usually require a bit more footwork
in order to perform the "integer" tag masking operations on the FP value.

--
Andy Glew, aglew@uiuc.edu

khoult@bbn.com (Kent Hoult) (03/06/90)

In article <11298@encore.Encore.COM> jkenton@pinocchio.encore.com (Jeff Kenton) writes:
>From article <1990Mar5.031003.12107@Neon.Stanford.EDU>, by wilson@carcoar.Stanford.EDU (Paul Wilson):
>> I'm looking for info that would be useful in implementing short
>> floating point numbers for Lisp/Smalltalk/etc.
>> 
>> The basic idea is that I want short (1-word) floating point numbers to
>> fit in one 32-bit words, *with* a runtime tag included.  My plan is
>
>For each bit of exponent you drop you cut the range in half -- 4 bit exponents
>only leave a range of .004 to 511.  Each four bits of mantissa you drop will
>lose you slightly more than 1 decimal digit.
>
>Furthermore, anything you do will mean that the FP hardware won't work with
>your number format.  And, no, you couldn't fake it, truncate it or trick it
>to get "approximately right" results from a chain of calculations "most of
>the time".  Your users would shoot you.

Not quite true. I've built a lisp machine where we did exactly this for
short floats. Even in a truncated format, they can be very useful for certain
things (graphics is one of the best examples).

As long as it is clearly document what you lose and gain by using the short
format (which is definately not IEEE anymore) there isn't a problem. Users
that really want true IEEE will select the IEEE format. But the shorts are
good for lots of other stuff.

The format we used was to truncate the bottom 6 bits of mantissa and shifted
the word right 6 bits. This left room for a 6 bit tag while still having just
over 5 decimal digits af accuracy (2^17). Conversion was then trivial in
either direction ( short->single is a 6 bit left shift, and single->short is
6 bit right shift and OR in the datatype, plus beware of NAN's).

Kent Hoult
TEL: (617) 873-4385     ARPA: khoult@bbn.com

ok@goanna.oz.au (Richard O'keefe) (03/06/90)

In article <1990Mar5.031003.12107@Neon.Stanford.EDU>,
 wilson@carcoar.Stanford.EDU (Paul Wilson) writes:
> [ wants to implement short-floats; thinks it would be cool to take bits
>   out of the exponent; wants to know how many ]

> I figure that I can just lop a couple of bits of of the exponent part, and
> leave the fraction part alone.  That way, I sacrifice range of floats
> but not precision.

The snag with that is that you sacrifice *all* the precision at the
ends of your ranges.  The IEEE-754 formats are very carefully balanced;
there are actually rules that tell you what would be good sizes for the
fields.  Obviously, a short-float cannot be IEEE-754, but you *could*
try to see how many of the constraints listed in IEEE-854 you can
satisfy.

To give you some idea of the kind of thinking that is involved:
suppose you have N effective mantissa bits (counting the hidden bit).
You would like nextafter(1.0,/*in the direction of*/ 2.0) - 1.0
(that is, the next number just bigger than 1.0, minus 1.0) to be
representable; that's your epsilon.  So you want 2**(1-N) to be
representable, and it's a general rule that the exponent range should
be roughly symmetric, so you want the range to be at least 1-N..N-1
which means you need approx ceil(log2(N)+1) exponent bits.  For N=24,
that means an absolute minimum of 6 exponent bits.  More than that,
you'd really like an epsilon's worth of epsilon, so you want at least
ceil(log2(N)+2) exponent bits, and now we're just 1 short of 754'8
8 exponent bits.

It is much better to steal bits out of the significand field.
There are several Prolog systems that do it; however that approach
is losing its popularity.  It was ok when you were just entering
co-ordinates for graphics, but for serious calculation anything
less than the usual 32 bits is unsafe in unskilled hands.

If you do decide to steal bits out of the significand field,
make a conscious design decision about whether you want to preserve
the "graceful underflow" property of the IEEE standards or not.  If
you do, you'd have quite a reasonable system, but the conversions
are not as simple.  If you don't, the conversions are simpler, but
the resulting arithmetic system isn't much like IEEE.

That's what most people have talked about.
But it's *not* what Paul Wilson *asked* about.
His proposal is:

> If a float result goes out of that range, I'll make
> a full-sized floating point object on the heap, and use a tagged
> pointer to it.

This is a really neat idea.  I like it a LOT.  He's not actually
proposing to change the properties of the arithmetic system at all,
merely the representation in memory.  (Rather like the VAX tiny-float
immediate operands.)

Presumably DEC picked the range of their tiny-floats based on
some expectation of the range of constants in programs; if your
coding covers that, you're doing at least as well as the VAX.

A suggestion:  why not scan through CALGO (ACM Collected Algorithms)
and issues of JRSS Series C and such things to see what some typical
ranges are like.  I've had ranges of 10**9 or more in my code, but so what?

weaver@weitek.WEITEK.COM (03/07/90)

In article <1990Mar5.031003.12107@Neon.Stanford.EDU> wilson@carcoar.Stanford.EDU (Paul Wilson) writes:
>I'm looking for info that would be useful in implementing short
>floating point numbers for Lisp/Smalltalk/etc.
>
>The basic idea is that I want short (1-word) floating point numbers to
>fit in one 32-bit words, *with* a runtime tag included.  My plan is
>to store the number shifted left a few bits, at a tag in the lower bits.

Someone has already mentioned the standard IEEE-854. This is a useful
document that describes how to implement a floating point system, 
with your own selection of the number of bits used for exponent and
fraction. (In contrast, IEEE-754 describes actual formats as well
as aspects such as rounding convered in IEEE-854).

I believe that if, rather than reducing the size of the exponent, you
reduce the size of the fraction, you can more easily use the existing
floating point support. Supposing you use the least significant bits 
for tag, to execute a floating point operation you would:

1. copy operands.
2. clear least signifigant bits that were used for tags.
3. add, etc., with pre-existing hardware/software.
4. check for exceptions and round to your precision (lots of work).
5. add new tag. 

It seems to me (without a great deal of study) that if your base 
system already does IEEE-754, that you could use this as I have 
set out to implement IEEE-854 with your choice of fraction size.