michael@orcisi.UUCP (Michael Herman) (05/04/87)
This is a re-posting of the README, foo and foo.c files for the code originally posted by keith@seismo quite some time ago (I had reposted the code itself a few months ago). # This is a shell archive. Remove anything before this line, then # unpack it by saving it in a file and typing "sh file". (Files # unpacked will be owned by you and have default permissions.) # # This archive contains: # README foo foo.example echo x - README cat > "README" << '//E*O*F README//' History: On 10/7/83 C. Havener (charlie@genrad.UUCP) put a perfect hash function finding program on the net. On 11/4/84 R. Dietrich (robertd@tektronix.UUCP) put a Pascal version of the same program on the net. Both of these programs were based on "More on Minimal Perfect Hash Tables," Colorado State University Technical Report, April 1981, by Cook, Curtis R. and Oldehoeft, R. R., and "Minimal Perfect Hash Functions Made Simple" by Richard J. Cichelli - Comm. of ACM Jan 1980. These programs had the following limitations, some of which they shared, some of which they didn't: -- limited number of strings (40 or 110) per hash table. -- slow to extremely slow run-time (large amounts of both recursion and parallel array processing). -- a requirement that the hash strings be in a certain order. -- a fixed hashing algorithm. -- inability to handle strings that hashed to identical values. In a recent program I was doing, I had to hash approximately 500 different keywords, preferably in one access. To do this, I ended up rewriting the above programs. The attached version has fixed some of the limitations. -- no keyword limit; I've used it to hash up to about 2400 entries. -- it's several orders of magnitude faster. -- it's fairly indifferent to the order the strings are presented in. -- you can specify the keys to be hashed. -- it handles strings that hash to identical values. The basic algorithm is: hash = the length of the string plus the sum of all of the associated values of the key characters. The program assigns values to the characters it is using for hashing until a set of the values gives each keyword a unique value. For example: If the string was "thekid" and the key characters were 't' and 'd', with 't' having a value of 5 and 'd' having a value of 8, the hash value of "thekid" would be 19; 8 + 5 + strlen("thekid"); It takes two arguments -- -d -- print out some interesting debugging information. -k -- use the following characters of the string in the hashing algorithm. If the program was called as "perf -k137" and the string "abcdefgh" was entered, the characters 'a', 'c', and 'g' would be used by the hashing scheme. To make this easier, the program prints out a routine at the end of its run that, when passed a string, will return the correct hash value. Also, the special character '$' tells the program to use the last character of every string. Default, if the k flag isn't specified, is "-k1$", i.e. the first and last character of every string. followed by a filename containing the keywords. Ex: suppose there's a file "foo" containing the keywords: ====================================================================== //E*O*F README// echo x - foo cat > "foo" << '//E*O*F foo//' political man can have as his aim the realization of freedom //E*O*F foo// echo x - foo.example cat > "foo.example" << '//E*O*F foo.example//' ====================================================================== the command "perf foo" produces the following: ====================================================================== 52 -- political 44 -- man ** the keywords are listed 45 -- can with their hash values. 33 -- have 19 -- as 39 -- his 14 -- aim 26 -- the 61 -- realization 50 -- of 43 -- freedom perf: 11 keywords. ****** static long ** followed by a routine that hash(str) will return that value if register char *str; passed the keyword as an argument. { static int cval[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 10, 25, 0, 19, 0, 0, 0, 27, 11, 30, 23, 16, 0, 20, 17, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; register long hval; register int len; switch(hval = (long)(len = strlen(str))) { default: hval += cval[(int)str[0]]; hval += cval[(int)str[len - 1]]; case 0: return(hval); } } //E*O*F foo.example// exit 0