[mod.compilers] Query about token mapping

johnl@ima.UUCP (Compilers mailing list) (09/05/86)

I would like to know if anyone has heard of the following algorithm, or
knows how to write it, or has any ideas about how it could be written.

Say you have N numbers, {X}, which you know ahead of time, and you want
to assign them the integers {1-N} in any order, but in a unique fashion.
Numbers not in {X} can be assigned anything, although it would be nice if
they were assigned to, say 0, or N+1.  This is nice for parse tables, or for
case statements where the {X} cover a large spread and the compiler uses
an if-else tree.  What I want is a program which writes the above program.
The subroutine:

alg(Y)
    for( i = {1-N} )
	if( Y = X[i] ) return i


is of order N operations, probably 3 or 4 N.  Better is a binary tree:

alg(Y)
    if( Y = X[N/2] ) return N/2
    else
    if( Y < X[N/2] )
	if( Y = X[N/4] ) return N/4
	else
	if( Y < X[N/4] )
	    ...
    else
    if( Y > X[N/2] )
	if( Y = X[3N/4] ) return 3N/4
	...

which is of order 2logN operations, but longer.  Also, there is the lookup
table of size max{X}.  Yuk.

None of these are optimal.  Now we get to the meat of the problem.  I bet
that there is an algorithm which will find a set of binary operations,
such as binary and, or, not, xor, etc, which, when handed a Y, will
return {1-N}.  This would be a black box with about logN operations, and
would be optimal for both speed and space.  For example:

X = { 0, 3, 5 }.  then

alg(Y)
    return X & 3

will map ( 0 -> 0 ), ( 3 -> 3 ), ( 5 -> 1 ).  2 was skipped, but oh well.

Anyway, any ideas?


Brian Sutin
Department of Astrophysical Sciences/Princeton University
{ princeton, ... }!astrovax!sutin

[This problem is known as perfect hashing.  The traditional application is
taking a token from a program you're compiling and seeing if it's one of the
reserved words.  More formally, a perfect hash function for a set of tokens
maps exactly one of the tokens into each of a set of hash buckets.  Computing
perfect hash functions is hard -- it takes minutes of computer time on
something like a PDP-11.  Once you have the hash function, using it is O(1)
because you take a token, hash it, compare it to the token that belongs in the
bucket hashed to, and either it's that token or it's not in your set.

There has been a fair amount written on both the form of perfect hash functions
and methods of computing them, typically by fitting some sort of polynomial.
Do any of you readers know of good references?  I can't find any right now.
-John]
-- 
Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima

Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

johnl@ima.UUCP (Compilers mailing list) (09/08/86)

Cc:


>There has been a fair amount written on both the form of perfect hash functions
>and methods of computing them, typically by fitting some sort of polynomial.
>Do any of you readers know of good references?  I can't find any right now.

There is a very good (but short) article on reciprocal hashing (a form
of perfect hashing) by G. Jaeschke in Comm ACM, 24,12 (Dec. 1981).  The
article explains advantages of using the reciprocal method to obtain
perfect hash functions and briefly mentions other methods.  References
from this article that may also be useful are: (edited)

1. Cichelli, R.J. Minimal perfect hash functions made simple.
Comm. ACM, 23, 1 (Jan 1980), 17-19.
2. Jaescjke, G. Minimal Perfect Hashing. Tech Rept. TR 80.02.0003
IBM Heidelberg Scientific Center, Niederlassung Heideberg, Germany.
3. Jaeschke, G., Osterburg, G. On Cichelli's minimal perfect hash
functions method.  Comm. ACM, 23, 12 (Dec 1980), 728-729.
4. Knott, G.D. Hashing Functions, Computer J., 18 (Aug 1975), 265-278
5. Sprugnoli, R. Perfect hashing functions: a single probe retrieving
method for static sets. Comm. ACM, 20, 11 (Nov 1977), 841-850.

Paul Guthrie
ihnp4!ihdev!pdg
-- 
Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima

Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request