rsk@s.cc.purdue.edu (Frozen Wombat) (01/13/88)
I've recently tripled the size of the wordlist used by spell, and now I'm running into problems with it; I strongly suspect that I'll have to change some of the constants used to hash the wordlist, but I haven't been able to nail down precisely what to do. Here is what I believe to be the relevant code from spell.h: #else (if unix is defined) #define SHIFT 4 #define TABSIZE 25000 /*(int)(400000/(1<<shift))--pdp11 compiler deficiency*/ short tab[TABSIZE]; #endif long p[] = { 399871, 399887, 399899, 399911, 399913, 399937, 399941, 399953, 399979, 399983, 399989, }; #define NP (sizeof(p)/sizeof(p[0])) #define NW 30 /* * Hash table for spelling checker has n bits. * Each word w is hashed by k different (modular) hash functions, hi. * The bits hi(w), i=1..k, are set for words in the dictionary. * Assuming independence, the probability that no word of a d-word * dictionary sets a particular bit is given by the Poisson formula * P = exp(-y)*y**0/0!, where y=d*k/n. * The probability that a random string is recognized as a word is then * (1-P)**k. For given n and d this is minimum when y=log(2), P=1/2, * whence one finds, for example, that a 25000-word dictionary in a * 400000-bit table works best with k=11. */ Now, the old spelling list had just under 25000 words; the new one has just under 80000. I figured that changing TABSIZE would do the trick, but it doesn't appear to work. Then I noticed that the table p[] seems to be composed of primes close to 400000--and the 400000 comes into the picture because of the comment on TABSIZE. I think, but am not sure, that I'll have to come up with some new primes...but honestly, I'm very confused. I don't really understand the long comment in the source, which is unfortunate, 'cause I suspect that the key I need is there. Can anyone explain this to me in a little more detail and/or tell me what I need to do? The users are gathered outside my office with torches and things are getting ugly... Thanks, Rich Kulawiec, rsk@s.cc.purdue.edu, s.cc.purdue.edu!rsk