ian@loral.UUCP (Ian Kaplan) (11/27/84)
A little background regarding this posting:
A month or two ago there was some discussion in this news group
regarding mimimal perfect hasing functions. For readers who are not
familiar with this arcane topic, a perfect hasing function is a
hash function which will always return the item in the hash table (or
the fact that it is not there) in one probe. A minimal perfect hashing
function produces a hash table which has no "holes". In a
previous article I posted a list of article references discussing
minimal perfect hasing functions (MPH). Send me a note if you would
like a copy.
MPHs are classically of interest to people who work on compilers and
data bases. For example it is possible to produce a MPH table for the
reserved words in Pascal. Such a table occupies a minimal amount of
space, and is very fast. On a Pascal compiler I worked on it noticably
increased the speed of the compiler. (The compiler previously used a
binary tree search).
One of the most recent articles on MPH algorithms is
Chang, C. C. "The study of an ordered minimal perfect hashing
scheme" in the Comm. of the ACM, V27, 4 (April 1984) 384-387
I found Mr. Chang's article very difficult and I did not entirely
understand it. Mr. Chang reported an algorithm which would construct a
MPH table without huristic search (which is used by other algorithms).
This looked really interesting. Since reading Mr. Chang's article I
have been hoping to find someone wiser to explain it to me. Well in this
months issue of the CACM (Nov. 1984, pgs 1156-1157) there was some
discussion of Mr. Chang's algorithm. In response to a letter from Paul
Pritchard of Cornell University (a wise man), Mr. Chang wrote:
"Finally, I never claimed that my minimal perfect hashing function is
practical. In fact, I have been searching through the literature in
vain for a practical minimal perfect hashing function. If someone
knows of any practical minimal perfect hashing function, I hope
they'll bring it to my attention."
So if any of you out there are still trying to figure out this algorith,
you can now move on to more productive things. I think that two things
can be learned from the above paragraph:
1. A general MPH function probably does not exist.
2. It is amazing what some achademics will publish and what referees
will let by. What was the point of Chang's article? Why did a "flag
ship" journal like the CACM publish it? Perhaps the referees did not
understand the article either.
For those of you who have little interst in this arcane topic, I hope
that you have not read this far. Since the original discuassion began
here and there is not an obviously better news group, I submitted it
here.
Ian Kaplan
Loral Data Flow Group
Loral Instrumentation
ucbvax!sdcsvax!sdcc6!loral!ian
8401 Aero Dr.
San Diego, CA
92123
(619) 560-5888 x4812 gvcormack@watdaisy.UUCP (Gordon V. Cormack) (11/28/84)
You may (or may not) be interested in a paper by myself titled
"Practical perfect hashing" to appear the the next issue of
"The Computer Journal".
I am not claiming that this method is the best for a compiler, as
it requires 3 divisions per access. Indeed, it is not clear to me
why there exists this great desire for a perfect hash function in
this case. One can construct an ordinary hash table that performs
extremely well. I also see little need for a minimal function,
as the number of keywords is quite small.
If anyone can't wait for the above article to appear, send me a
message and I will send you a pre-print.
ABSTRACT
A practical method is presented that permits
retrieval from a table in constant time. The
method is suitable for large tables and consumes,
in practice, O(n) space for n table elements. In
addition, the table and the hashing function can
be constructed in O(n) expected time. Variations
of the method that offer different compromises
between storage usage and update time are present-
ed.
Gordon V. Cormack
Department of Computer Science
University of Waterloo
Waterloo, Ontario N2L 3G1
gvcormack@watdaisy.uucp
gvcormack%watdaisy@waterloo.csnet