[net.lang.pascal] Minimal Perfect Hashing Functions

ian@loral.UUCP (Ian Kaplan) (11/27/84)

  A little background regarding this posting:

    A month or two ago there was some discussion in this news group
    regarding mimimal perfect hasing functions.  For readers who are not 
    familiar with this arcane topic, a perfect hasing function is a 
    hash function which will always return the item in the hash table (or 
    the fact that it is not there) in one probe.  A minimal perfect hashing 
    function produces a hash table which has no "holes".  In a 
    previous article I posted a list of article references discussing
    minimal perfect hasing functions (MPH).  Send me a note if you would 
    like a copy.

    MPHs are classically of interest to people who work on compilers and
    data bases.  For example it is possible to produce a MPH table for the
    reserved words in Pascal.  Such a table occupies a minimal amount of
    space, and is very fast.  On a Pascal compiler I worked on it noticably
    increased the speed of the compiler.  (The compiler previously used a
    binary tree search).

  One of the most recent articles on MPH algorithms is

    Chang, C. C. "The study of an ordered minimal perfect hashing 
    scheme" in the Comm. of the ACM, V27, 4 (April 1984) 384-387

  I found Mr. Chang's article very difficult and I did not entirely
  understand it.  Mr. Chang reported an algorithm which would construct a
  MPH table without huristic search (which is used by other algorithms).
  This looked really interesting.   Since reading Mr. Chang's article I
  have been hoping to find someone wiser to explain it to me.  Well in this
  months issue of the CACM (Nov. 1984, pgs 1156-1157) there was some
  discussion of Mr. Chang's algorithm.  In response to a letter from Paul
  Pritchard of Cornell University (a wise man), Mr. Chang wrote:

    "Finally, I never claimed that my minimal perfect hashing function is
     practical.  In fact, I have been searching through the literature in
     vain for a practical minimal perfect hashing function.  If someone
     knows of any practical minimal perfect hashing function, I hope
     they'll bring it to my attention."

  So if any of you out there are still trying to figure out this algorith,
  you can now move on to more productive things.  I think that two things
  can be learned from the above paragraph: 
  
  1.  A general MPH function probably does not exist.

  2.  It is amazing what some achademics will publish and what referees 
      will let by.  What was the point of Chang's article?  Why did a "flag
      ship" journal like the CACM publish it?  Perhaps the referees did not
      understand the article either.

  For those of you who have little interst in this arcane topic, I hope
  that you have not read this far.  Since the original discuassion began
  here and there is not an obviously better news group, I submitted it
  here.

			     Ian Kaplan
			     Loral Data Flow Group
			     Loral Instrumentation

			     ucbvax!sdcsvax!sdcc6!loral!ian

			     8401 Aero Dr.
			     San Diego, CA
			     92123
			     (619) 560-5888 x4812 

gvcormack@watdaisy.UUCP (Gordon V. Cormack) (11/28/84)

You may (or may not) be interested in a paper by myself titled
"Practical perfect hashing" to appear the the next issue of
"The Computer Journal".

I am not claiming that this method is the best for a compiler, as
it requires 3 divisions per access.  Indeed, it is not clear to me
why there exists this great desire for a perfect hash function in
this case.  One can construct an ordinary hash table that performs
extremely well.  I also see little need for a minimal function,
as the number of keywords is quite small.

If anyone can't wait for the above article to appear, send me a 
message and I will send you a pre-print.

                          ABSTRACT

          A  practical method is presented that permits
     retrieval  from  a  table  in  constant time.  The
     method  is suitable for large tables and consumes,
     in  practice, O(n) space for n table elements.  In
     addition,  the  table and the hashing function can
     be  constructed in O(n) expected time.  Variations
     of  the  method  that  offer different compromises
     between storage usage and update time are present-
     ed.

Gordon V. Cormack
Department of Computer Science
University of Waterloo
Waterloo, Ontario N2L 3G1

gvcormack@watdaisy.uucp
gvcormack%watdaisy@waterloo.csnet