[comp.lang.c] Perfect hashing table/function

gdtltr@vax1.acs.udel.edu (Gary D Duzan) (01/03/89)

   Does anyone have a perfect hashing table/function generated for the C
keywords that you would be willing to let me use? I am trying to figure out how
to generate my own, but I am having trouble. Any help would be appreciated.

					Gary Duzan
[I have seen a lot of perfect hash functions come and go over the years,
particularly in Sigplan, but have the impression that the traditional
technique of putting the keywords in the symbol table is liable to give you
better performance.  Consider the relative frequency in typical programs of,
say, the keyword "enum" and the variable name "p".  -John]
[From Gary D Duzan <gdtltr@vax1.acs.udel.edu>]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

compilers-sender@ima.ima.isc.com (01/04/89)

In article <3114@ima.ima.isc.com> Gary D Duzan <gdtltr@vax1.acs.udel.edu> writes:
>[perfect hash function for C keywords]

Try anonymous ftp to ics.uci.edu (192.5.19.1).  Get
pub/perfect-tar.1.0.Z  (I haven't tried this for awhile,
so I don't know that it's still there).

This is an implementation of "perfect" hashing thanks to
Douglas C. Schmidt  (schmidt@BONNIE.ICS.UCI.EDU), and I
believe that it is used for the hash table for keywords in
the GNU C compiler.

	;-D on  ( Perfect Gnhashing )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
[Don't ask me to FTP stuff, I don't have FTP access.  Sorry.  -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

cosell@BBN.COM (Bernie Cosell) (01/04/89)

In article <3116@ima.ima.isc.com> pardo@june.cs.washington.edu (David Keppel) writes:
}>[perfect hash function for C keywords]
}Try anonymous ftp to ics.uci.edu (192.5.19.1).  Get
}pub/perfect-tar.1.0.Z  (I haven't tried this for awhile,
}so I don't know that it's still there).

Yes, it is still there.  Its name is pub/perfect.tar.1.1.Z.  Don't forget to
FTP it in image mode.
__
Bernie Cosell
BBN Sys & Tech, Cambridge, MA 02238
cosell@bbn.com
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

schmidt@siam.ics.uci.edu (Doug Schmidt) (01/06/89)

In article <3114@ima.ima.isc.com> Gary D Duzan <gdtltr@vax1.acs.udel.edu> writes:
>
>   Does anyone have a perfect hashing table/function generated for the C
>keywords that you would be willing to let me use? I am trying to figure out how
>to generate my own, but I am having trouble. Any help would be appreciated.
>
>					Gary Duzan

The following code should do the trick.  If anyone is interested in
obtaining the perfect hash function generator program that created the
code below try anonymous ftp to ics.uci.edu ( 182.195.0.1 ), in the
~ftp/pub directory.  There are 2 versions available there:

perfect.tar.1.1.Z ( an older, functional, if inelegant solution written 
                    for GNU C )
gperf.tar.1.0.Z ( a far more elegant solution, written for GNU G++
                  1.31 or 1.32 )

If you don't have access to ftp, send me a note and I'll mail you a
shar file copy.

Doug Schmidt

--------------------------------( reserved.c )------------------------------

static char * wordlist [ ] =
{
	 "", "", "",
	 "int", 
	 "goto", 
	 "short", 
	 "struct", 
	 "",
	 "register", 
	 "auto", 
	 "break", 
	 "switch", 
	 "if", 
	 "for", 
	 "long", 
	 "float", 
	 "sizeof", 
	 "typedef", 
	 "", "",
	 "union", 
	 "return", 
	 "", "",
	 "char", 
	 "const", 
	 "static", 
	 "", "",
	 "enum", 
	 "while", 
	 "",
	 "do", 
	 "volatile", 
	 "void", 
	 "",
	 "signed", 
	 "default", 
	 "unsigned", 
	 "", "", "", "", "", "", "",
	 "extern", 
	 "", "",
	 "case", 
	 "", "", "",
	 "continue", 
	 "else", 
	 "", "", "", "", "", "",
	 "double", 
};

#define MIN_WORD_SIZE 2
#define MAX_WORD_SIZE 8
#define MIN_HASH_VALUE 3
#define MAX_HASH_VALUE 61
/*
32 keywords
59 is the maximum key range
*/

static int hash ( str,len )
register char	*str;
register int	len;
{
	static int cval [ ] = {
		  61,  61,  61,  61,  61,  61,  61,  61,  61,  61,
		  61,  61,  61,  61,  61,  61,  61,  61,  61,  61,
		  61,  61,  61,  61,  61,  61,  61,  61,  61,  61,
		  61,  61,  61,  61,  61,  61,  61,  61,  61,  61,
		  61,  61,  61,  61,  61,  61,  61,  61,  61,  61,
		  61,  61,  61,  61,  61,  61,  61,  61,  61,  61,
		  61,  61,  61,  61,  61,  61,  61,  61,  61,  61,
		  61,  61,  61,  61,  61,  61,  61,  61,  61,  61,
		  61,  61,  61,  61,  61,  61,  61,  61,  61,  61,
		  61,  61,  61,  61,  61,  61,  61,   5,   5,  20,
		  30,  25,  10,   0,   5,   0,  61,   0,  10,   0,
		  15,   0,  61,  61,   0,   0,   0,   0,   0,   0,
		  61,  61,  61,  61,  61,  61,  61,  61,
	};
	return ( len + cval [ str [ 0 ] ] + cval [ str [ len - 1 ] ] );
}

int in_word_set ( str,len )
register char *str;
register int len;
{
	if ( len <= MAX_WORD_SIZE && len >= MIN_WORD_SIZE ) {
		register int key = hash ( str,len );

		if ( key <= MAX_HASH_VALUE && key >= MIN_HASH_VALUE ) {
			register char *s = wordlist [ key ];
			return ( *s == *str && ! strcmp ( str + 1, s + 1 ) );
		}
	}
	return ( 0 );
}

------------------------------( test.c )------------------------------

/*
   Tests the generated perfect has function.
   The -v option prints diagnostics as to whether a word is in 
   the set or not.  Without -v the program is useful for timing.
*/ 
  
#include <stdio.h>

#define MAX_LEN 80

int main ( int argc, char *argv [ ] ) {
   int  verbose = ( argc > 1 ) ? 1 : 0;
   char buf [ MAX_LEN ];

   while ( gets ( buf ) ) {
      if ( in_word_set ( buf, strlen ( buf ) ) && verbose ) {
         printf ( "in word set %s\n", buf );
      }
      else if ( verbose ) {
         printf ( "NOT in word set %s\n", buf );
      }
   }

   return ( 0 );
}
--
schmidt@ics.uci.edu (ARPA) |   Per me si va nella citta' dolente.
office: (714) 856-4043     |   Per me si va nell'eterno dolore.
                           |   Per me si va tra la perduta gente.
                           |   Lasciate ogni speranza o voi ch'entrate.

schmidt@siam.ics.uci.edu (Doug Schmidt) (01/06/89)

In article <3182@paris.ics.uci.edu> I <schmidt@siam.ics.uci.edu> write:

|>The following code should do the trick.  If anyone is interested in
|>obtaining the perfect hash function generator program that created the
|>code below try anonymous ftp to ics.uci.edu ( 182.195.0.1 ), in the
|>~ftp/pub directory.  There are 2 versions available there:

Oops, dyslexia strikes again.  The internet number is actually:

128.195.0.1

Doug
--
schmidt@ics.uci.edu (ARPA) |   Per me si va nella citta' dolente.
office: (714) 856-4043     |   Per me si va nell'eterno dolore.
                           |   Per me si va tra la perduta gente.
                           |   Lasciate ogni speranza o voi ch'entrate.