[comp.unix.questions] Deleting Entries in "spell" Dictionary

nels@astrovax.UUCP (02/13/87)

References:


A misspelling has crept into the dictionary used by the spell program
here (spell erroneously accepts a misspelling of a famous astronomer's;
name; somebody here got an angry letter from her when the misspelling
appeared in a paper of his!).  Is there an easy way to remove it?

Unfortunately, we do NOT have an up-to-date ASCII list of all the words
added to the spell database.

Nels Anderson                             Princeton University, Astrophysics

UUCP:     {allegra,akgua,cbosgd,decvax,
	   ihnp4,noao,philabs,princeton,topaz}!astrovax!nels
ARPANET:  nels%astrovax@rutgers.edu
BITNET:   6070106@pucc -- or, if your mailer doesn't mind, 
          nels@astrovax.princeton.edu

boykin@custom.UUCP (02/13/87)

In article <843@astrovax.PRINCETON.EDU>, nels@astrovax.PRINCETON.EDU (Nels Anderson) writes:
> 
> A misspelling has crept into the dictionary used by the spell program here
> ... Is there an easy way to remove it?
> Unfortunately, we do NOT have an up-to-date ASCII list of all the words
> added to the spell database.

The only guaranteed way of doing this is by having the original list,
however...  If you're not using SV52 or later, than you can remove the word.
SVR2 uses a huffman encoding, for which I don't know of a way to
remove an entry.  Other versions of spell use a 50K bitwise hash table.
For each word in the original list there are 12 (?) bits set within the
table.  To remove a single word involves knowing which 12 bits are set
and writing a program (or using a debugger) to reset those bits.
Among other ways of finding this out is to create a new database with
only that one word in it.  For PC/SPELL under DOS the program is SPINSHST,
I don't remember what the name of the UNIX program is, or it's syntax, check
the UNIX manuals for details.  Anyway, create the new database, dump it
out to see which bits are set and clear those bits within the original
database.

That's the good news, now for the bad news!  While you just removed that
one entry, there is a definite possibility you've removed other words as well!
In order for a word to be considered in the database, all 12 bits
must be set, however, two words can have an intersection of bits which
are set.  You will probably find that you just introduced a number
of misspellings by doing this!

My reccomendation would be to start over again and maintain the list
of words you need.  Start over with a reasonable list of known valid words
(the distribution tape is reasonable place to start!).  SPELL keeps a
log of misspelled words, look at it regularly and rebuild your database
when people complain.

Good luck!

Joe Boykin
Custom Software Systems
{necntc, frog}!custom!boykin