[comp.unix.questions] What does "spell" do wrong?

jaap@chromo.ucsc.edu (Jacob Wilbrink) (05/12/89)

I've been wondering what the program "spell" does, since it
seems to make very many errors. Some examples of words it thinks
are spelled correctly are

utomsrr
mgdesou
aneorxx

How does "spell" ever derive those words by adding inflections,
prefixes and suffixes to words in the dictionary?

Please e-mail responses to jaap%chromo@ucscc.ucsc.edu
or cmph000@ucscj.ucsc.edu.

Thanks.

irf@kuling.UUCP (Bo Thide') (05/13/89)

In article <7084@saturn.ucsc.edu> jaap@chromo.UUCP (Jacob Wilbrink) writes:
>I've been wondering what the program "spell" does, since it
>seems to make very many errors. Some examples of words it thinks
>are spelled correctly are
>
>utomsrr
>mgdesou
>aneorxx

All these words are caught as misspelled by the HP-UX version of spell(1).


   ^   Bo Thide'--------------------------------------------------------------
  | |       Swedish Institute of Space Physics, S-755 91 Uppsala, Sweden
  |I|    [In Swedish: Institutet f|r RymdFysik, Uppsalaavdelningen (IRFU)]
  |R|  Phone: (+46) 18-403000.  Telex: 76036 (IRFUPP S).  Fax: (+46) 18-403100 
 /|F|\        INTERNET: bt@irfu.se       UUCP: ...!uunet!sunic!irfu!bt
 ~~U~~ -----------------------------------------------------------------sm5dfw


rowland@hpavla.HP.COM (Fred Rowland) (05/17/89)

I tried those three "words", in sentences just in case there
was something funny about the isolated text, and spell caught
them all.

I'm using HP's HP-UX too.

Fred Rowland
Avondale Division

decot@hpisod2.HP.COM (Dave Decot) (05/17/89)

>I've been wondering what the program "spell" does, since it
>seems to make very many errors. Some examples of words it thinks
>are spelled correctly are
>
>utomsrr
>mgdesou
>aneorxx

What drug were you on when you put these "misspellings" in your document? :-)

Actually, most implementations of spell(1) use a hashed dictionary of
correct words.  These words apparently happened to hash to the same
place as real words.

Dave Decot

cowan@marob.MASA.COM (John Cowan) (05/18/89)

In article <1007@kuling.UUCP> irf@kuling.UUCP (Bo Thide') writes:
>In article <7084@saturn.ucsc.edu> jaap@chromo.UUCP (Jacob Wilbrink) writes:
>>I've been wondering what the program "spell" does, since it
>>seems to make very many errors. Some examples of words it thinks
>>are spelled correctly are
>>
>>utomsrr
>>mgdesou
>>aneorxx
>
>All these words are caught as misspelled by the HP-UX version of spell(1).
>

My version of 'spell' catches them also.  However, in defense of the program,
it is not designed to be 100% reliable.  'Spell' uses a hashing scheme.
Each word is stripped of prefixes and suffixes, and the resulting base form
is hashed and looked up in a bit table.  If the bit is 0, the word is 
certainly misspelled; if the bit is 1, the word is assumed correct.  There
are 30,000 1-bits in a 10^27 bit table, so the probability of false positives
is about 1/4000.

According to Doug McIlroy, the author of 'spell', a typical document contains
20 misspelled words or less.  Therefore, about 1% of documents contain a
misspelled word that is not reported.

Source:  Jon Bentley, >Programming Pearls<, ISBN 0-201-10331-1.

kucharsk@uts.amdahl.com (William Kucharski) (05/18/89)

Spell(1) in Amdahl's UTS 2.0 catches the three words as well.
-- 
					William Kucharski

ARPA: kucharsk@uts.amdahl.com
UUCP: ...!{ames,decwrl,sun,uunet}!amdahl!kucharsk

Disclaimer:  The opinions expressed above are my own, and may not agree with
	     those of any other sentient being, not to mention those of my 
	     employer.  So there.