[comp.lang.c] how to write hash function for double

thomasw@hpcupt1.cup.hp.com (Thomas Wang) (06/19/91)

I want to write a hash function for C double floating point number.
I can either convert the double to string, then hash the string value.
Or I can just scramble the bits contained in the double number.

The question is whether there can be two different bit patterns of double
that represented the same double number?

double a;
double b;

...

if ((a == b) && (memcmp(&a, &b, sizeof(double)) != 0))
{
   printf("help!\n"); /* can this ever happen in Ansi-C ? */
}


 -Thomas Wang
              (Everything is an object.)
                                                     wang@hpdmsjlm.cup.hp.com
                                                     thomasw@hpcupt1.cup.hp.com

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/24/91)

In article <67790003@hpcupt1.cup.hp.com>, thomasw@hpcupt1.cup.hp.com (Thomas Wang) writes:
> The question is whether there can be two different bit patterns of double
> that represented the same double number?

It depends on which machine you are using.  On a machine with IEEE arithmetic,
do you want to consider -0.0 and +0.0 "the same"?  (== must, but they behave
differently.)  On an IBM mainframe, there are a great many bit patterns
corresponding to the name number.  On a VAX (except for there being two
different kinds of "double") you are ok.

Converting to text is not a good idea; it is relatively expensive and it
isn't always as accurate as it should be.

-- 
I agree with Jim Giles about many of the deficiencies of present UNIX.

kremer@cs.odu.edu (Lloyd Kremer) (06/25/91)

In article <67790003@hpcupt1.cup.hp.com> thomasw@hpcupt1.cup.hp.com (Thomas Wang) writes:
>
>The question is whether there can be two different bit patterns of double
>that represented the same double number?
>

It would sure help if you specified what architecture you have since that is
the place where floating math is implemented.

In the PC/Intel world where floating point is performed using an 80x87 chip
or software equivalent, doubles are almost universally implemented as IEEE 754
long reals.

In this scenario every bit pattern represents a unique number with the
following exceptions:

	0.0 and -0.0 are different bit patterns, but represent the same
	mathematical entity (zero).  Most compilers convert -0.0 to 0.0
	before storing the value, so you're probably OK here.

	Positive and negative infinity are different patterns, but are
	mathematically indistinguishable assuming you're using the "projective"
	model of infinity.  Since the 80x87 chips use the projective model by
	default, this shouldn't be a problem either.

	NaN (not a number) values can be encoded in many different ways, but
	if you're throwing NaN's around in a C program you've got big problems
	already, so don't worry about it.

In summary, I think you'll be safe assuming a one-to-one correspondence between
reasonable numbers and bit patterns in the Intel world.

					Lloyd Kremer
					Hilton Systems, Inc.
					kremer@cs.odu.edu

mouse@thunder.mcrcim.mcgill.edu (der Mouse) (06/26/91)

In article <67790003@hpcupt1.cup.hp.com>, thomasw@hpcupt1.cup.hp.com (Thomas Wang) writes:

> I want to write a hash function for C double floating point number.
> I can either convert the double to string, then hash the string
> value.  Or I can just scramble the bits contained in the double
> number.

It depends on what properties you want the hash to have, I suppose.  Do
you require that two numerically equal numbers must have the same hash
value, or do you need identical hash values only for identical
representations, instead of for identical values?

> The question is whether there can be two different bit patterns of
> double that represented the same double number?

Certainly.  On a VAX, for example, any bit-pattern with the sign and
exponent fields full of 0 bits represents an exact 0.0 value.  (There
are 2^23 - 8388608 - such bit-patterns.  And that's just for
single-float.)

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu