[comp.unix.wizards] How to increase wordlist size in spell?

rsk@s.cc.purdue.edu (Frozen Wombat) (01/13/88)

I've recently tripled the size of the wordlist used by spell, and now
I'm running into problems with it; I strongly suspect that I'll have
to change some of the constants used to hash the wordlist, but I haven't
been able to nail down precisely what to do.

Here is what I believe to be the relevant code from spell.h:

#else (if unix is defined)
#define SHIFT	4
#define TABSIZE 25000	/*(int)(400000/(1<<shift))--pdp11 compiler deficiency*/
short	tab[TABSIZE];
#endif
long	p[] = { 399871, 399887, 399899, 399911, 399913,
	399937, 399941, 399953, 399979, 399983, 399989, };

#define	NP	(sizeof(p)/sizeof(p[0]))
#define	NW	30

/*
* Hash table for spelling checker has n bits.
* Each word w is hashed by k different (modular) hash functions, hi.
* The bits hi(w), i=1..k, are set for words in the dictionary.
* Assuming independence, the probability that no word of a d-word
* dictionary sets a particular bit is given by the Poisson formula
* P = exp(-y)*y**0/0!, where y=d*k/n.
* The probability that a random string is recognized as a word is then
* (1-P)**k.  For given n and d this is minimum when y=log(2), P=1/2,
* whence one finds, for example, that a 25000-word dictionary in a
* 400000-bit table works best with k=11.
*/

Now, the old spelling list had just under 25000 words; the new one
has just under 80000.  I figured that changing TABSIZE would do the
trick, but it doesn't appear to work.  Then I noticed that the table
p[] seems to be composed of primes close to 400000--and the 400000
comes into the picture because of the comment on TABSIZE.  I think,
but am not sure, that I'll have to come up with some new primes...but
honestly, I'm very confused.  I don't really understand the long comment
in the source, which is unfortunate, 'cause I suspect that the key
I need is there.

Can anyone explain this to me in a little more detail and/or tell me
what I need to do?  The users are gathered outside my office with torches
and things are getting ugly...

Thanks,
Rich Kulawiec, rsk@s.cc.purdue.edu, s.cc.purdue.edu!rsk