ian@loral.UUCP (Ian Kaplan) (11/27/84)
A little background regarding this posting: A month or two ago there was some discussion in this news group regarding mimimal perfect hasing functions. For readers who are not familiar with this arcane topic, a perfect hasing function is a hash function which will always return the item in the hash table (or the fact that it is not there) in one probe. A minimal perfect hashing function produces a hash table which has no "holes". In a previous article I posted a list of article references discussing minimal perfect hasing functions (MPH). Send me a note if you would like a copy. MPHs are classically of interest to people who work on compilers and data bases. For example it is possible to produce a MPH table for the reserved words in Pascal. Such a table occupies a minimal amount of space, and is very fast. On a Pascal compiler I worked on it noticably increased the speed of the compiler. (The compiler previously used a binary tree search). One of the most recent articles on MPH algorithms is Chang, C. C. "The study of an ordered minimal perfect hashing scheme" in the Comm. of the ACM, V27, 4 (April 1984) 384-387 I found Mr. Chang's article very difficult and I did not entirely understand it. Mr. Chang reported an algorithm which would construct a MPH table without huristic search (which is used by other algorithms). This looked really interesting. Since reading Mr. Chang's article I have been hoping to find someone wiser to explain it to me. Well in this months issue of the CACM (Nov. 1984, pgs 1156-1157) there was some discussion of Mr. Chang's algorithm. In response to a letter from Paul Pritchard of Cornell University (a wise man), Mr. Chang wrote: "Finally, I never claimed that my minimal perfect hashing function is practical. In fact, I have been searching through the literature in vain for a practical minimal perfect hashing function. If someone knows of any practical minimal perfect hashing function, I hope they'll bring it to my attention." So if any of you out there are still trying to figure out this algorith, you can now move on to more productive things. I think that two things can be learned from the above paragraph: 1. A general MPH function probably does not exist. 2. It is amazing what some achademics will publish and what referees will let by. What was the point of Chang's article? Why did a "flag ship" journal like the CACM publish it? Perhaps the referees did not understand the article either. For those of you who have little interst in this arcane topic, I hope that you have not read this far. Since the original discuassion began here and there is not an obviously better news group, I submitted it here. Ian Kaplan Loral Data Flow Group Loral Instrumentation ucbvax!sdcsvax!sdcc6!loral!ian 8401 Aero Dr. San Diego, CA 92123 (619) 560-5888 x4812
gvcormack@watdaisy.UUCP (Gordon V. Cormack) (11/28/84)
You may (or may not) be interested in a paper by myself titled "Practical perfect hashing" to appear the the next issue of "The Computer Journal". I am not claiming that this method is the best for a compiler, as it requires 3 divisions per access. Indeed, it is not clear to me why there exists this great desire for a perfect hash function in this case. One can construct an ordinary hash table that performs extremely well. I also see little need for a minimal function, as the number of keywords is quite small. If anyone can't wait for the above article to appear, send me a message and I will send you a pre-print. ABSTRACT A practical method is presented that permits retrieval from a table in constant time. The method is suitable for large tables and consumes, in practice, O(n) space for n table elements. In addition, the table and the hashing function can be constructed in O(n) expected time. Variations of the method that offer different compromises between storage usage and update time are present- ed. Gordon V. Cormack Department of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1 gvcormack@watdaisy.uucp gvcormack%watdaisy@waterloo.csnet