[comp.lang.c++] Perfect hash function for g++ reserved words

bds@lzaz.ATT.COM (B.SZABLAK) (09/21/88)

In article <8809192248.aa03503@PARIS.ICS.UCI.EDU>, schmidt@siam.ics.uci.EDU ("Douglas C. Schmidt") writes:
...
> p.s. Suggestions for improvement of the basic approach are always welcome.
...
> + /* Contains the offset into the word_table for each of
> +    the reserved words.  This table is indexed by the
> +    hash function value.  Some entries are dummies, needed
> +    since the function is perfect, but not minimum. */

An old trick when using this technique is to use a sparse table (i.e. far
from minimum) in order to reduce hash collisions of variable names with
keyword tokens. This will result in executing the following loop a fewer
number of times:

+             while (str_start < str_end) 
+               {
+                 if (*str++ != word_table[str_start++]) 
+                   {
+                     return (NULL); /* bail if no match */
+                   }
+               }

A perfect minimal table would always require a check
to insure that you have the keyword and not a variable name, but
by having a sparse table many of those checks will be avoided as variable
names will likely hash to an unused table entry.

You'll have to experiment to find the optimal table size.