daulys@aristotle.ils.nwu.edu (03/23/90)
From: Vitas Daulys <daulys@aristotle.ils.nwu.edu> This is a general request going out in search of any available spell-checker or spell-correction routines. Either English or Spanish words would be optimum and LISP is the preferred language, but any variations of these are acceptable. Thanks
jpalmucc@bbn.com (Jeff Palmucci) (03/23/90)
Here is a spell checker that I wrote for a natural language system ~3 years ago. It allows you to build a dictionary through the add-spelling function, and retrieve close matches through find-similar. A close word it one with ONE of the following errors: added character, deleted character, replaced character, transposed characters. It finds these words by hashing on (<first half of the word> . length) and (length . <second half of the word>). Since we are only looking for words 1 error away from the possible misspelling, one of the halfs will be correct, and should lead to the correct word. The code is not particularly efficient, and it stores the entire dictionary in memory. It works great on a Symbolics, which have about 170mb swapping space standard, but may be too costly for large dictionaries on smaller machines. If you are using a Symbolics, dictionary disk saves can be done quickly with sys:write-forms-to-file. Other machines may have to build it from scratch each time you power up. The indexing idea came from an undergraduate bachelors thesis at MIT, forgot which one. Jeff --------------------- I make no guarantees or restrictions on the following code: (defun get-search-keys (sym) "Returns the hash codes that the word should be listed under." (let* ((word (symbol-name sym)) (word-length (length word)) (up-word (string-upcase word)) (prefix-base (subseq up-word 0 (1- (ceiling (/ word-length 2))))) (suffix-base (subseq up-word (- word-length -1 (ceiling (/ word-length 2)))))) ; if the word is too short for a key, return the null (cond ((< word-length 3) `((,word-length . ""))) ((evenp word-length) (list (cons word-length prefix-base) (cons suffix-base word-length))) (t (list (cons word-length prefix-base) (cons suffix-base word-length) (cons word-length (subseq prefix-base 0 (1- (length prefix-base)))) (cons (subseq suffix-base 1) word-length)))))) (defun get-index-keys (sym) "Returns the hash codes that the word may be found under" (let* ((word (symbol-name sym)) (word-length (length word)) (up-word (string-upcase word)) (prefix-base (subseq up-word 0 (1- (ceiling (/ word-length 2))))) (suffix-base (subseq up-word (- word-length -1 (ceiling (/ word-length 2)))))) ; if the word is too short for a key, return the null (cond ((= word-length 1) '((1 . "") (2 . ""))) ((= word-length 2) '((1 . "") (2 . "") (3 . ""))) ((oddp word-length) (list (cons (1- word-length) (subseq prefix-base 0 (1- (length prefix-base)))) (cons word-length prefix-base) (cons (1+ word-length) prefix-base) (cons (subseq suffix-base 1) (1- word-length)) (cons suffix-base word-length) (cons suffix-base (1+ word-length)))) (t (list (cons (1- word-length) prefix-base) (cons word-length prefix-base) (cons (1+ word-length) prefix-base) (cons suffix-base (1- word-length)) (cons suffix-base word-length) (cons suffix-base (1+ word-length))))))) (defun make-spelling-table () "Creates a table that is used to store the most likely word matches in" (make-hash-table :test #'equal :size 256 :rehash-size 1.5 :rehash-threshold .75)) (defun add-spelling (sym table &optional &key (longword nil)) "Adds a word to a specific spelling table" (mapc #'(lambda (k) (push sym (gethash k table))) (get-search-keys sym))) (defun find-similar (sym table) "Returns a list of all similarp words in table" (remove-duplicates (mapcan #'(lambda (x) (if (similarp (symbol-name sym)(symbol-name x)) (list x))) (remove-duplicates (mapcan #'(lambda (x) (copy-list (gethash x table))) (get-index-keys sym)) :test #'equal)))) (defun similarp (word1 word2) "Are the two words similar enough to be considered spelling replacements." (or (equal word1 word2) (do ((l1 (coerce word1 'list) (cdr l1)) (l2 (coerce word2 'list) (cdr l2))) ((neq (car l1) (car l2)) (or (equal (cdr l1) l2) (equal l1 (cdr l2)) (equal (cdr l1) (cdr l2)) (and (eq (car l1) (cadr l2)) (eq (cadr l1) (car l2)) (equal (cddr l1) (cddr l2))))))))