[comp.lang.c] hash function for text in C

cpshelley@violet.waterloo.edu (cameron shelley) (10/17/90)

(Greeting(time == morning? Goodmorning: (
            time == afternoon? Goodafternoon:
            Goodevening)
         )
);

  I have been (futily :<) experimenting with hash functions to 
hash english words into a large array.  No book I can find around
here gives the topic more than cursory discussion.  Does anyone
out there know of a good, simple one?  They must exist, and I'm
getting a little tired of mine... (er, function that is, I can't
afford another data structures book right now :).

  I'll be happy to post mine if anyone needs a good laugh, btw.

				Cam

--
      Cameron Shelley        | "Saw, n.  A trite popular saying, or proverb. 
cpshelley@violet.waterloo.edu|  So called because it makes its way into a
    Davis Centre Rm 2136     |  wooden head."
 Phone (519) 885-1211 x3390  |				Ambrose Bierce

macphee@convex.COM (Scott C. Mac Phee) (10/17/90)

In article <1990Oct16.184951.513@watdragon.waterloo.edu> cpshelley@violet.waterloo.edu (cameron shelley) writes:
>
>  I have been (futily :<) experimenting with hash functions to 
>hash english words into a large array.  No book I can find around
>here gives the topic more than cursory discussion.  Does anyone
>out there know of a good, simple one?  They must exist, and I'm
>getting a little tired of mine... (er, function that is, I can't
>afford another data structures book right now :).

Here is some code I've been using.  It is an adapted version of some
hashing algorithms I've seen pass through this newsgroup.

Hope it helps...

===========================================================================

/*
 * ============================================================================
 *                     HASH TABLE FUNCTIONS
 * ============================================================================
 */

#define HASH_TABLE_SIZE 1000
static	int	eventhash[HASH_TABLE_SIZE] ;
static	int	hash_entries,			/* number in table...	*/
		collisions ;			/* count of collisions	*/

int
init_hash_table()
{
register int i ;

    hash_entries = collisions = 0 ;

    for (i=0; i<HASH_TABLE_SIZE;) eventhash[i++] = 0 ;

}

/*
 * calc_hash_value.c
 *
 * calculate a hash table location from the given name.
 *
 */

int
calc_hash_value(name)
char    *name ;
{
register    int hash ;

    for (hash=0; *name;)
        hash = (hash<<2) ^ *name++ ;

    return(abs(hash % HASH_TABLE_SIZE)) ;

}


--
===============================================================================
CONVEX Computer Corporation (CVX)                  Richardson, Texas
Scott C. Mac Phee (..uunet.uu.net!convex.com!macphee)   214-497-4772
===============================================================================

macphee@convex.COM (Scott C. Mac Phee) (10/17/90)

In article <107251@convex.convex.com> macphee@convex.COM (Scott C. Mac Phee) writes:
>In article <1990Oct16.184951.513@watdragon.waterloo.edu> cpshelley@violet.waterloo.edu (cameron shelley) writes:
>>
>>  I have been (futily :<) experimenting with hash functions to 
>>hash english words into a large array.  No book I can find around
>>here gives the topic more than cursory discussion.  Does anyone
>>out there know of a good, simple one?  They must exist, and I'm
>>getting a little tired of mine... (er, function that is, I can't
>>afford another data structures book right now :).
>
>Here is some code I've been using.  It is an adapted version of some
>hashing algorithms I've seen pass through this newsgroup.
>
>Hope it helps...
>
>===========================================================================
>
>/*
> * ============================================================================
> *                     HASH TABLE FUNCTIONS
> * ============================================================================
> */
>
>#define HASH_TABLE_SIZE 1000
>static	int	eventhash[HASH_TABLE_SIZE] ;
>static	int	hash_entries,			/* number in table...	*/
>		collisions ;			/* count of collisions	*/



Sorry... amended code...

the hash table is an array of pointers to a structure, not int.

ex: 
struct event {
	char	*	name ;
	struct  event	* next ;
	} eventhash[HASH_TABLE_SIZE] ;

If a collision occurs use the hashed entry as the head of a linked
list and walk the linked list until the end is found, then add
the entry at the end.

I pulled the code out and then added the declarations without looking
at the original code.  Ten minutes later I realized I'd messed up.

Is today Monday, again?

--
===============================================================================
CONVEX Computer Corporation (CVX)                  Richardson, Texas
Scott C. Mac Phee (..uunet.uu.net!convex.com!macphee)   214-497-4772
===============================================================================

chris@mimsy.umd.edu (Chris Torek) (10/18/90)

The following implements expanding chained hash tables for strings and
numbers.  This is unmodified from a special-purpose program (having to
do with reading a large number of files with name=id pairs and doing
various operations on them).  Chain lengths are not monitored; instead,
the table is invariably expanded whenever it becomes 2/3 full.  Entries
may be neither removed nor modified.

An essentially-unrelated function called `string' maintains a string
pool such that string(x) == string(y) iff strcmp(x,y)==0 (the program
typically stores each name several times, and often needs to test for
name equality, so this helps there).

: Run this shell script with "sh" not "csh"
PATH=/bin:/usr/bin:/usr/ucb:/etc:$PATH
export PATH
all=false
if [ x$1 = x-a ]; then
	all=true
fi
echo Extracting hash.h
sed 's/^X//' <<'//go.sysin dd *' >hash.h
X/*
X * $Id: hash.h,v 1.1 90/09/24 23:58:38 chris Exp $
X *
X * Hash table entries keep track of name (or id) = value pairs.
X * Values are simply pointers.  Hash tables themselves are named
X * (for debugging).
X */
X
Xstruct	hashtab;
X
X/*
X * The `ins' functions return nonzero if the old value existed,
X * without changing the value.
X * The iterate functions calls a given function with name,value
X * or id,value pairs.
X */
Xstruct	hashtab *ht_new();	/* given name, create new hash table */
Xchar	*ht_nget();		/* given table and name, get value */
Xchar	*ht_iget();		/* given table and ID, get value */
Xint	ht_nins();		/* given table and name, insert new value */
Xint	ht_iins();		/* given table and id, insert new value */
Xvoid	ht_niterate();		/* given table and function, iterate */
Xvoid	ht_iiterate();		/* given table and function, iterate */
X
X/*
X * Some things that ought not to be here, but are anyway.
X * goodnumber() takes a number of objects and a size and returns
X * a new number of objects, such that malloc(goodnumber(n,size)*size)
X * calls malloc with a `good' size value (resulting in less wasted
X * memory).  emalloc is malloc with program-abort on out-of-memory.
X * string() makes a `read-only' copy of a string, reusing the previous
X * copy if any.
X */
Xint	goodnumber();		/* given n & sizeof, return new n */
Xchar	*emalloc();		/* malloc, exit on error */
Xchar	*string();		/* make an `ideal' copy of a string */
//go.sysin dd *
if [ `wc -c < hash.h` != 1415 ]; then
	made=false
	echo error transmitting hash.h --
	echo length should be 1415, not `wc -c < hash.h`
else
	made=true
fi
if $all; then
	echo '	Changing owner to bin'
	chown bin hash.h
else
	echo '	Original owner was bin'
fi
if $made; then
	chmod 444 hash.h
	echo -n '	'; ls -ld hash.h
fi
echo Extracting hash.c
sed 's/^X//' <<'//go.sysin dd *' >hash.c
X#ifndef lint
Xstatic char rcsid[] = "$Id: hash.c,v 1.2 90/10/02 13:32:23 chris Exp $";
X#endif
X
X/*
X * Hash table routines.
X */
X
X#include <stdio.h>
X#include <stdlib.h>
X#include <string.h>
X#include "hash.h"
X
X/*
X * Hash table entries keep track of name=value pairs.
X * The names may be numeric IDs instead (by having a null name).
X */
Xstruct hashent {
X	struct	hashent *h_next;/* next in chain */
X	int	h_hash;		/* hash value or ID */
X	char	*h_name;	/* name (null if from numeric ID) */
X	char	*h_value;	/* value */
X};
X
Xstruct hashtab {
X	int	ht_size;	/* size (power of 2) */
X	int	ht_mask;	/* == ht_size - 1 */
X	int	ht_used;	/* number of entries used */
X	int	ht_lim;		/* when to expand */
X	struct	hashent **ht_tab;/* base of table */
X	char	ht_name[1];	/* table name; actually larger */
X};
X
Xextern char *progname;
X
Xchar *
Xemalloc(n)
X	size_t n;
X{
X	register char *p = malloc(n);
X
X	if (p == NULL) {
X		(void) fprintf(stderr, "%s: out of memory\n", progname);
X		exit(1);
X	}
X	return (p);
X}
X
X/* round up to next multiple of y, where y is a power of 2 */
X#define	ROUND(x, y) (((x) + (y) - 1) & ~((y) - 1))
X
X/* compute a `good' number of objects to allocate via malloc */
Xint
Xgoodnumber(n, s)
X	int n;
X	size_t s;
X{
X
X	/* 16384 is a guess at a good page size for malloc */
X	/* 32 is a guess at malloc's overhead */
X	return ((int)((ROUND(n * s + 32, 16384) - 32) / s));
X}
X
X/*
X * Make a new hash table.
X */
Xstruct hashtab *
Xht_new(name)
X	char *name;
X{
X	register struct hashtab *ht;
X	register struct hashent **h;
X	register int n;
X
X	ht = (struct hashtab *)emalloc(sizeof *ht + strlen(name));
X	ht->ht_tab = h = (struct hashent **)emalloc(128 * sizeof *h);
X	ht->ht_size = 128;
X	ht->ht_mask = 127;
X	for (n = 128; --n >= 0;)
X		*h++ = NULL;
X	ht->ht_used = 0;
X	ht->ht_lim = (128 * 2) / 3;
X	(void) strcpy(ht->ht_name, name);
X	return (ht);
X}
X
X/*
X * Expand an existing hash table.
X */
Xstatic void
Xht_expand(ht)
X	register struct hashtab *ht;
X{
X	register int n = ht->ht_size * 2, i;
X	register struct hashent *p, **h, **oldh, *q;
X
X	h = (struct hashent **)emalloc(n * sizeof *h);
X	for (i = n; --i >= 0;)
X		*h++ = NULL;
X	h -= n;
X	oldh = ht->ht_tab;
X	n--;
X	for (i = ht->ht_size; --i >= 0;) {
X		for (p = *oldh++; p != NULL; p = q) {
X			q = p->h_next;
X			p->h_next = h[p->h_hash & n];
X			h[p->h_hash & n] = p;
X		}
X	}
X	free((char *)ht->ht_tab);
X	ht->ht_tab = h;
X	ht->ht_mask = n;
X	ht->ht_size = ++n;
X	ht->ht_lim = (n * 2) / 3;
X}
X	
X/*
X * Make a new hash entry.  Its h_next will be NULL.
X */
Xstatic struct hashent *
Xnewhashent(hash, name, value)
X	int hash;
X	char *name, *value;
X{
X	static struct hashent *hfree;
X	register struct hashent *h;
X	register int n, nalloc;
X
X	if ((h = hfree) != NULL)
X		hfree = h->h_next;
X	else {
X		nalloc = goodnumber(2, sizeof *h);	/* need at least 2 */
X		hfree = h = (struct hashent *)emalloc(nalloc * sizeof *h) + 1;
X		for (n = nalloc - 2; --n >= 0; h++)
X			h->h_next = h + 1;
X		h->h_next = NULL;
X		h -= nalloc - 1;
X	}
X	h->h_next = NULL;
X	h->h_hash = hash;
X	h->h_name = name;
X	h->h_value = value;
X	return (h);
X}
X
X#define	HASH(str, h, p) \
X	for (p = str, h = 0; *p;) h = (h << 5) - h + *p++
X
X/*
X * Look up a name=value.
X */
Xchar *
Xht_nget(ht, name)
X	register struct hashtab *ht;
X	char *name;
X{
X	register char *p;
X	register int hash;
X	register struct hashent *h;
X
X	HASH(name, hash, p);
X	p = name;
X	for (h = ht->ht_tab[hash & ht->ht_mask]; h != NULL; h = h->h_next)
X		if (h->h_hash == hash && h->h_name != NULL &&
X		    strcmp(h->h_name, p) == 0)
X			return (h->h_value);
X	return (NULL);
X}
X
X/*
X * Look up an ID=value.
X */
Xchar *
Xht_iget(ht, id)
X	register struct hashtab *ht;
X	register int id;
X{
X	register struct hashent *h;
X
X	for (h = ht->ht_tab[id & ht->ht_mask]; h != NULL; h = h->h_next)
X		if (h->h_hash == id && h->h_name == NULL)
X			return (h->h_value);
X	return (NULL);
X}
X
X/*
X * Insert (do not clobber) a name=value.
X * Return zero on success.
X */
Xint
Xht_nins(ht, name, value)
X	register struct hashtab *ht;
X	char *name, *value;
X{
X	register char *p;
X	register int hash;
X	register struct hashent *h, **hp;
X
X	HASH(name, hash, p);
X	p = name;
X	for (hp = &ht->ht_tab[hash & ht->ht_mask]; (h = *hp) != NULL;
X	     hp = &h->h_next)
X		if (h->h_hash == hash && h->h_name != NULL &&
X		    strcmp(h->h_name, p) == 0)
X			return (-1);
X	*hp = newhashent(hash, name, value);
X	if (++ht->ht_used > ht->ht_lim)
X		ht_expand(ht);
X	return (0);
X}
X
X/*
X * Insert (do not clobber) an ID=value.
X * Return zero on success.
X */
Xint
Xht_iins(ht, id, value)
X	register struct hashtab *ht;
X	register int id;
X	char *value;
X{
X	register struct hashent *h, **hp;
X
X	for (hp = &ht->ht_tab[id & ht->ht_mask]; (h = *hp) != NULL;
X	     hp = &h->h_next)
X		if (h->h_hash == id && h->h_name == NULL)
X			return (-1);
X	*hp = newhashent(id, (char *)NULL, value);
X	if (++ht->ht_used > ht->ht_lim)
X		ht_expand(ht);
X	return (0);
X}
X
X/*
X * Stash a copy of a string away; it will never be freed.
X */
Xstatic char *
Xpoolstr(s)
X	char *s;
X{
X	register char *p;
X	register size_t l = strlen(s) + 1;
X	static char *poolp;
X	static size_t nleft;
X
X	if (nleft < l)
X		poolp = emalloc(nleft = goodnumber(l, 1));
X	bcopy(s, p = poolp, l);
X	poolp += l;
X	return (p);
X}
X
X/*
X * Generate a single unique copy of the given string.
X */
Xchar *
Xstring(s)
X	char *s;
X{
X	register char *p;
X	register int hash;
X	register struct hashent *h, **hp;
X	static struct hashtab *ht;
X
X	if (ht == NULL)
X		ht = ht_new("strings");
X	HASH(s, hash, p);
X	p = s;
X	for (hp = &ht->ht_tab[hash & ht->ht_mask]; (h = *hp) != NULL;
X	     hp = &h->h_next)
X		if (h->h_hash == hash && strcmp(h->h_name, p) == 0)
X			return (h->h_name);
X	*hp = h = newhashent(hash, poolstr(s), (char *)NULL);
X	if (++ht->ht_used > ht->ht_lim)
X		ht_expand(ht);
X	return (h->h_name);
X}
X
X/*
X * Call fn on all the name=value pairs.
X */
Xvoid
Xht_niterate(ht, fn)
X	struct hashtab *ht;
X	register void (*fn)();
X{
X	register struct hashent *h, **hp;
X	register int n;
X
X	for (n = ht->ht_size, hp = ht->ht_tab; --n >= 0;)
X		for (h = *hp++; h != NULL; h = h->h_next)
X			if (h->h_name != NULL)
X				(*fn)(h->h_name, h->h_value);
X}
X
X/*
X * Call fn on all the id=value pairs.
X */
Xvoid
Xht_iiterate(ht, fn)
X	struct hashtab *ht;
X	register void (*fn)();
X{
X	register struct hashent *h, **hp;
X	register int n;
X
X	for (n = ht->ht_size, hp = ht->ht_tab; --n >= 0;)
X		for (h = *hp++; h != NULL; h = h->h_next)
X			if (h->h_name == NULL)
X				(*fn)(h->h_hash, h->h_value);
X}
//go.sysin dd *
if [ `wc -c < hash.c` != 6291 ]; then
	made=false
	echo error transmitting hash.c --
	echo length should be 6291, not `wc -c < hash.c`
else
	made=true
fi
if $all; then
	echo '	Changing owner to bin'
	chown bin hash.c
else
	echo '	Original owner was bin'
fi
if $made; then
	chmod 444 hash.c
	echo -n '	'; ls -ld hash.c
fi
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

gaynor@paul.rutgers.edu (Silver) (10/18/90)

You must supply definitions of uint8 (an 8-bit unsigned int), uint16
(similarly), and uint32 (likewise).

----------------------------------- hash.h ------------------------------------
/* This is an implementation of Peter K. Pearson's hash algorithm as         */
/* presented in "Fast Hashing of Variable-Length Text Strings", ACM 6/90.    */
/* The conclusion of his article is cited below.                             */
/*                                                                           */
/*   The main advantages of the hashing function presented here are:         */
/*   1. No restriction is placed on the length of the text string.           */
/*   2. The length of the text string does not need to be known beforehand.  */
/*   3. Very little arithmetic is performed on each character being hashed.  */
/*   4. Similar strings are not likely to collide.                           */
/*   5. Minimal, perfect hashing functions can be built in this form.        */
/*  [6. It's average distribution is quite random.]                          */
/*                                                                           */
/*   Its principal disadvantages are:                                        */
/*   1. Output value ranges that are not powers of 2 are somwehat more       */
/*      complicated to provide.                                              */
/*   2. More auxiliary memory (the 256-byte table T) is required by this     */
/*      hashing function than by many traditional functions.                 */
/*  [3. I blew 50 cents copying the article and then had to type it in.]     */
/*                                                                           */
/* Elaborating on advantage 5 above, by fatootsing with the permutation      */
/* table T, one can make hash8 perform perfect minimal hashes.  The author   */
/* provides the following guidelines for construction of such a table:       */
/*                                                                           */
/*   [C[N] are the characters of the string being hashed and h[N] is the Nth */
/*    iteration of the loop.]                                                */
/*                                                                           */
/*   1. A table T was consructed by pseudorandom permutation of the integers */
/*      (0 ... 255).                                                         */
/*   2. One by one, the desired values were assigned to the words in the     */
/*      list.  Each assignment was effected by exchanging two elemnts in the */
/*      table.                                                               */
/*   3. For each word, the first candidate considered for exchange was       */
/*      T[h[n-1] ^ C[n]], the last table element referenced in the           */
/*      computation of the hash function for that word.                      */
/*   4. A table element could not be exchanged if it was referenced during   */
/*      the hashing of a previously assgined word or if it was referenced    */
/*      earlier in the hashing of the same word.                             */
/*   5. If the necessary exchange was forbidden by Rule 4, attention was     */
/*      shifted to the previously referenced table element,                  */
/*      T[[h[n-2]] ^ C[n-1]].                                                */
/*                                                                           */
/*   The procedure is not always successful.  For example, using the ascii   */
/*   character codes, if the word "a" hashes to 0 and the word "i" hashes to */
/*   15, it turns out that the word "in" must hash to 0.  Initial attempts   */
/*   to map Knuth's 31 words onto the integers (0 ... 30) failed for exactly */
/*   this reason.  The shift to the range (1 ... 31) was an ad hoc tactic to */
/*   circumvent this problem.                                                */

/* Hash s into an 8, 16, or 32 bit unsigned integer.  The larger-sized       */
/* values are generated by applying hash8 in parallel.                       */
extern uint8  hash8(uint8 * s);
extern uint16 hash16(uint8 * s);
extern uint32 hash32(uint8 * s);

/* The hash permutation table.                                               */
extern uint32 * hashpermutation;
----------------------------------- hash.c ------------------------------------
#include "hash.h"

static uint8 permutation[256] =
{
 51, 140, 233,  27, 118, 125, 170, 138, 119, 132, 174,  97,  25, 110,   1,  14,
 65,  36,  40, 188,  73, 173,   7,  30,  68,  56, 169, 234, 107, 177, 197,  87,
 28, 210, 186,  67,   2,  15, 115,  48, 223, 148, 211,  57, 190, 104, 213,  49,
144, 172, 147, 124, 157, 238, 167, 183,  78,  75,  58,  22,  70, 103, 181,  12,
254,  41, 198, 168,  46,  79, 241, 156,  83, 128,  66,  60,  86, 141, 161, 176,
221,  54, 192, 252, 116,  95, 206,  35,  88, 133, 154, 250, 237, 253,  85, 178,
 93, 159, 155,  42,   9,  89,   3,  61, 201, 158, 106,  82, 240, 255, 218, 102,
189,   8,  33,   4, 145,  16, 150,  26,  99, 100, 195, 175,  34,  50,  80, 166,
194, 195, 164,  29, 134, 105,  55, 143, 122, 130, 245, 208,  72,  77,  64, 121,
139, 232, 191, 108, 228, 137,  59,  74,  11, 126, 171,   5, 242, 101, 239, 193,
112, 113,  98,  21, 207, 225, 151, 251,  92,  91,  17, 127,  20,  81,  24,   6,
 43, 196, 204, 247, 212, 224, 220,  94,  32,  13, 187, 199, 214,  18, 226,  84,
 71, 231, 165,  19, 202, 217,  90, 129, 136, 153, 182, 111, 244,  45, 236, 249,
109,  47, 180, 205, 215, 160,  53, 162, 114, 246, 179,  62, 227,  96, 142, 230,
184, 146, 117,  39,  69,  37,  23,  63,  52, 216,   0, 135, 149,  31,  38,  44,
209, 120,  76, 203, 229, 123, 131, 152,  10, 219, 243, 248, 235, 222, 200, 163
};

uint8 hash8(register uint8 * s)
  {register static uint8 h;
   if (*s)
       {h = permutation[*s];
        while (*++s)
          h ^= permutation[h ^ *s];
        return(h);}
     else
       {return((uint8)0);}

uint16 hash16(register uint8 * s);
  {register uint8 h1;
   register uint8 h2;
   if (*s)
       {h1 = permutation[*s];
        h2 = permutation[*s + 1];
        while (*++s)
          {h1 ^= permutation[h1 ^ *s];
           h2 ^= permutation[h2 ^ *s];}
        return((uint16)((h1<<8) & h2));}
     else
       {return((uint16)0);}}

uint32 hash32(register uint8 * s);
  {register uint8 h1;
   register uint8 h2;
   register uint8 h3;
   register uint8 h4;
   if (*s)
       {h1 = permutation[*s]
        h2 = permutation[*s + 1];
        h2 = permutation[*s + 2];
        h2 = permutation[*s + 3];
        while (*++s)
          {h1 ^= permutation[h1 ^ *s];
           h2 ^= permutation[h2 ^ *s];
           h3 ^= permutation[h3 ^ *s];
           h4 ^= permutation[h4 ^ *s];}
        return((uint32)((h1<<24) & (h2<<16) & (h3<<8) & h4));}
     else
       {return((uint32)0);}}

ron@sco.COM (Ron Irvine) (10/20/90)

I have used many hash functions.  Each one has advantages
and disadvantages.  Here is a powerful but simple one.
It simply does a crc16 on the character string.  The crc16 is
a great hash function since it is designed to produce a
unique code for different character strings.  So for "reasonable"
text it gives a good distribution of hash numbers (a big problem
to solve in general).  The crc16 function I list below uses nibbles
to build the crc from each byte, this is faster than a bit by bit and
less room than a 256 entry table needed for a byte by byte (fits
in cache).  So, it is fast (no multiply, just shifts and masks),
well behaved and simple.  If you have a VAX you could even use the
"crc" instruction instead of the crc16 function - what a CISC.
To use, do a crc16 on the string and the truncate the 16 bit
result to the size you need (this should be a power of 2).


/*	crc16 - crc16 routine
 *
 *	R.K. Irvine
 *
 *	This routine returns the crc16 for the string pointed
 *	to by "in".
 *	crc16 is given by:  x^16 + x^15 + x^2 + 1
 */
unsigned short
crc16(register char *in) {
	register unsigned int	n, crc;

	static unsigned short crc16l[] = {
		0x0000,0xc0c1,0xc181,0x0140,
		0xc301,0x03c0,0x0280,0xc241,
		0xc601,0x06c0,0x0780,0xc741,
		0x0500,0xc5c1,0xc481,0x0440,
	};

	static unsigned short crc16h[] = {
		0x0000,0xcc01,0xd801,0x1400,
		0xf001,0x3c00,0x2800,0xe401,
		0xa001,0x6c00,0x7800,0xb401,
		0x5000,0x9c01,0x8801,0x4400,
	};

	crc = 0;
	while (n = (unsigned char)(*in++)) {	
		n ^= crc;
		crc = crc16l[n&0x0f] ^ crc16h[(n>>4)&0x0f] ^ (crc>>8);
	}
	return(crc);
}

unhd (Jonathan W Miner) (10/24/90)

In article <1990Oct16.184951.513@watdragon.waterloo.edu> cpshelley@violet.waterloo.edu (cameron shelley) writes:
>  I have been (futily :<) experimenting with hash functions to 
>hash english words into a large array.  No book I can find around
>here gives the topic more than cursory discussion.  Does anyone
>out there know of a good, simple one?  They must exist, and I'm
>getting a little tired of mine... (er, function that is, I can't
>afford another data structures book right now :).

I am not an expert on hash functions, but can suggest a couple of methods:
     1. Add the ascii values and 'mod' by the size of your array.
     2. Or, a small modification: add only certain characters of each
        string. Say 1,3,5,7 and 11 and 'mod' by the size of the array.

Both those methods, and many similar ones involve handling collisions in
some manner.  This can be messy (or so I'm told).  To skip this problem, 
use a dynamic hashing function.  This is refered to Fagin's Approach.  It
avoid's collisions by allocating space as needed and modifying the hash
function.

E-Mail if you want more info on this.

-----------------------------------------------------------------
Jonathan Miner        | I don't speak for UNH, and UNH does not 
jwm775@unhd.unh.edu   | speak for me! 
(603)868-3416         | Rather be downhill skiing... or hacking... 
-- 
-----------------------------------------------------------------
Jonathan Miner        | I don't speak for UNH, and UNH does not 
jwm775@unhd.unh.edu   | speak for me! 
(603)868-3416         | Rather be downhill skiing... or hacking...