[comp.std.c] Hash functions on pointers

conor@lion.inmos.co.uk (Conor O'Neill) (01/24/91)

I am curious as to whether it is possible to write a _portable_ _ANSI C_
hash function from pointers to integers.
In this sense, portable means that it should work and be vaguely efficient
on different machines; I do not mean that a given pointer should map to
the same integer on different machines (indeed, since pointer representations
may be different, this would be impossible).

What I require is a function which maps a pointer onto a number between
0 and N-1. The function may be (indeed, would be expected to be) many-to-one.

Obviously the vacuous
int hash (void *ptr) { return 0; }
isn't really good enough, even though it could be considered a hash function.

The problem is that the effect of casting a pointer to an integer is
implementation defined; thus an implementation could, for example, always
return 99.

ANSI standard, section 3.3.4:
``A pointer may be converted to an integral type. The size of integer required
and the result are implementation-defined. If the space provided is not
long enough, the behaviour is undefined''

This seems to indicate that there must exist an integral type (which may be a
different type for each implementation) to which a pointer may be converted.
(This does not imply that information may not be lost).
Is there any way of determining what this type is?
Do the other parts of the standard indicate that long int
would always be enough? Or maybe unsigned long int?
Neither seems clear to me.
Ideally there would have been defined something like ptrint_t,
by analogy with size_t, which would be an integral type which is `big enough'.
Presumably `integral type' means a standard type; is it permissible
to require an extended type such as long long int if that were supported?

The standard continues:
``An arbitrary integer may be converted to a pointer. The result is
implementation-defined. (Footnote 42).''

This seems to be less relevant for the suggested hash function, except
for the footnote (which does not constitute part of the standard):

``Footnote 42: The mapping functions for converting a pointer to an integer
or an integer to a pointer are intended to be consistent with the addressing
structure of the execution environment.''

This indicates that a `quality of implementation' factor is that
there is some sensible relationship between the `value' of the pointer
and the integral value; otherwise an implementation could satisfy the
standard by always returning 99 when casting a pointer to an integer.

Given all the above, is the following hash function legal and portable?
(I.e. will it have `undefined' effect in any implementation?)
(Assume that N is less than INT_MAX).

int hash (void *ptr)
{
  unsigned long int x = (unsigned long int)ptr;
  return x % N;
}

---
Conor O'Neill, Software Group, INMOS Ltd., UK.
UK: conor@inmos.co.uk		US: conor@inmos.com
"It's state-of-the-art" "But it doesn't work!" "That is the state-of-the-art".

diamond@jit345.swstokyo.dec.com (Norman Diamond) (01/25/91)

In article <13983@ganymede.inmos.co.uk> conor@inmos.co.uk () writes:
>I am curious as to whether it is possible to write a _portable_ _ANSI C_
>hash function from pointers to integers.

Strictly conforming, I am not sure.  Portable to every imaginable
implementation, yes.

In response to a different topic, it is not entirely certain whether a
pointer to an arbitrary object, cast to type (char *), actually points to
the first byte of the object and can be dereferenced, or not.  However, in
every imaginable implementation, of course it can be.

So, suppose your pointer is
  sometype *p;
Do a
  char *p_p;
  for (p_p = (char *) &p; p_p < (char *) &p + sizeof p; p ++) {
    some computation on *p_p, one of the bytes in p;
  }

At first glance, the ampersands in the for statement look like a mistake,
but of course they aren't.

>The problem is that the effect of casting a pointer to an integer is
>implementation defined; thus an implementation could, for example, always
>return 99.

Yes.  Or 0  :-)

>ANSI standard, section 3.3.4:
>``A pointer may be converted to an integral type. The size of integer required
>and the result are implementation-defined. If the space provided is not
>long enough, the behaviour is undefined''
>This seems to indicate that there must exist an integral type (which may be a
>different type for each implementation) to which a pointer may be converted.
>(This does not imply that information may not be lost).

Yes, this wording bugs me too.  If "the space provided" means the space
provided by the implementation-defined size of integer, then what is the
meaning of the implementation defining that size?  If "the space provided"
means the space provided by the user, then why even bother to include the
third sentence; the first two sentences do not state that a conversion to
the wrong size can be done at all.

>Is there any way of determining what this type is?

When it's implementation-defined, the implementation has to agree to tell
you -- in a manual, maybe a header file, maybe a telephone number, etc.

>Do the other parts of the standard indicate that long int
>would always be enough? Or maybe unsigned long int?

No.

>Ideally there would have been defined something like ptrint_t,
>by analogy with size_t, which would be an integral type which is `big enough'.

Of course.

>Presumably `integral type' means a standard type; is it permissible
>to require an extended type such as long long int if that were supported?

Presumptions don't go very far.  Good luck.

>... a `quality of implementation' factor is that
>there is some sensible relationship between the `value' of the pointer
>and the integral value; otherwise an implementation could satisfy the
>standard by always returning 99 when casting a pointer to an integer.

Of course.

>Given all the above, is the following hash function legal and portable?
>(Assume that N is less than INT_MAX).
>int hash (void *ptr)
>{
>  unsigned long int x = (unsigned long int)ptr;
>  return x % N;
>}

Whether it's legal or not depends on interpreting the standard.
Portable, probably, though I think my suggestion was more portable.

>(I.e. will it have `undefined' effect in any implementation?)

Will, probably not.  Allowed to, yes.
--
Norman Diamond       diamond@tkov50.enet.dec.com
If this were the company's opinion, I wouldn't be allowed to post it.