[comp.bugs.sys5] Non-word "accreditate" in /usr/dict/words

geoff@desint.UUCP (Geoff Kuenning) (03/10/88)

While working on ispell (yes, it's coming, but not real soon) I stumbled
across a non-word in my /usr/dict/words file.  The verb form of
"accreditation" is "accredit," despite the best efforts of some people
to change it.  If you're a stickler for accuracy, like me, edit
/usr/dict/words to change that entry to "accreditation".
-- 
	Geoff Kuenning   geoff@ITcorp.com   {uunet,trwrb}!desint!geoff

aburt@isis.UUCP (Andrew Burt) (03/11/88)

In article <1693@desint.UUCP> geoff@desint.UUCP (Geoff Kuenning) writes:
>...I stumbled across a non-word in my /usr/dict/words file.

Brings to mind a word that showed up on a list of five letter palindromes
(no, I wasn't bored, I was making a handout about regular expressions
with ^\(.\)\(.\).\2\1$ as an example):

	rever

Now, I admit I did find it in the OED.  But that was the only dictionary
of mine that listed it (of about a half dozen).

If I saw this in a document I'd assume it was a misspelling of "revert" or
"revere", etc.; and spell allows "revers" as a plural, which probably
should be "reverse".

This brings up an interesting question:  Should /usr/dict/words list
words that are technically allowable (listed in some notable dictionary)
but are (a) very uncommon and (b) very close to likely misspellings of
far more common words -- at the expense of not catching what are
probably typos?  To my mind, a spelling checker should flag words that
are correct over omitting incorrect words.

I can't see "but it makes the dictionary complete" argument being used
since many common words (in a Unix environment) are missing, such as
"filename", "pathname", "stdin",...  (Maybe a -u(nix) option to spell is
in order... :-)

-- 

Andrew Burt 				   			isis!aburt

              Fight Denver's pollution:  Don't Breathe and Drive.

twb@hoqax.UUCP (BEATTIE) (03/13/88)

This machine accepts "sincerly" as correctly spelled when it should only
accept "sincerely".

Tombo.

geoff@desint.UUCP (Geoff Kuenning) (03/18/88)

In article <1338@hoqax.UUCP> twb@hoqax.UUCP (BEATTIE) writes:

> This machine accepts "sincerly" as correctly spelled when it should only
> accept "sincerely".

This is because of the optimistic design of spell(1).  Spell has a list of
suffix rules, which it applies to all words indiscriminately.  A suffix
that only makes sense on a verb (e.g., -ment) will be applied to nouns,
adverbs, and adjectives as well.  Thus, for example, spell accepts
"sincerement" as well as "sincerly" (I just checked).

Ispell, by contrast, explicitly associates its suffixes with particular
roots, so it will reject both of these errors (again, I checked).  The
latest version of ispell is 2.0.02;  it's available (without a dictionary)
from the comp.sources.misc archives.  Don't hold your breath waiting for the
next posting;  it's coming but you'll match the sky before I get everything
together.

BTW, for anyone who cares, the original version of ispell was written
in 1971 (thanks to Ole Brinch Hansen for providing this tidbit).  17
years old!  Hell, some of its fans are younger than that!
-- 
	Geoff Kuenning   geoff@ITcorp.com   {uunet,trwrb}!desint!geoff

gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/20/88)

In article <1697@desint.UUCP> geoff@desint.UUCP (Geoff Kuenning) writes:
-In article <1338@hoqax.UUCP> twb@hoqax.UUCP (BEATTIE) writes:
-> This machine accepts "sincerly" as correctly spelled when it should only
-> accept "sincerely".
-This is because of the optimistic design of spell(1).  Spell has a list of
-suffix rules, which it applies to all words indiscriminately.  A suffix
-that only makes sense on a verb (e.g., -ment) will be applied to nouns,
-adverbs, and adjectives as well.  Thus, for example, spell accepts
-"sincerement" as well as "sincerly" (I just checked).

Spell (at least the System V version) has a "stop list" that can be
tweaked to catch common errors such as "sincerly" that slip through
the net.  Not great, but it works.

geoff@desint.UUCP (Geoff Kuenning) (03/25/88)

In article <7481@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:

> In article <1697@desint.UUCP> geoff@desint.UUCP (Geoff Kuenning) writes:
> -This is because of the optimistic design of spell(1).  Spell has a list of
> -suffix rules, which it applies to all words indiscriminately.  A suffix
> -that only makes sense on a verb (e.g., -ment) will be applied to nouns,
> -adverbs, and adjectives as well.  Thus, for example, spell accepts
> -"sincerement" as well as "sincerly" (I just checked).
> 
> Spell (at least the System V version) has a "stop list" that can be
> tweaked to catch common errors such as "sincerly" that slip through
> the net.  Not great, but it works.

The stop list dates back at least to V7 spell.  Unfortunately, it's not
a solution to this problem.  The difficulty is that there are far more
wrong words than right ones.  /usr/dict/words lists only "sincere", but
spell will accept "sincerly", "sincerement", "sincereness", "sincered",
"sinceres", and "sincereless" (again, I checked).  I admit that some of
these (notably -ment and -less) are not likely typos.  However, the "-d"
and "-s" forms are one-keystroke errors, and the "-ness" form can easily
be generated by a person who has momentarily forgotten the word "sincerity".
(Which worries me, BTW:  there aren't many words ending in "e" that can legally
have -ity added to them, but spell takes "sincerity"...)

The end result is that, if every possible typo were placed in the stop
list (a nontrivial task), the hashing scheme used would probably begin to
break down (though I haven't calculated the probability of this).
-- 
	Geoff Kuenning   geoff@ITcorp.com   {uunet,trwrb}!desint!geoff

andrey@arizona.edu (Andrey K. Yeatts) (03/26/88)

In article <1693@desint.UUCP>, geoff@desint.UUCP (Geoff Kuenning) writes:
| While working on ispell (yes, it's coming, but not real soon) I stumbled
| across a non-word in my /usr/dict/words file.  The verb form of
| "accreditation" is "accredit," despite the best efforts of some people
| to change it.  If you're a stickler for accuracy, like me, edit
| /usr/dict/words to change that entry to "accreditation".

| 	Geoff Kuenning   geoff@ITcorp.com   {uunet,trwrb}!desint!geoff

or else add in one of my favorites: "orientate," if only for the
completifaction of verbiation of your lexilogical database.
(Excusify me for being in a non-linear mode of epistolization :=)

-- 
Andrey Yeatts					Dept. of Computer Science
andrey@arizona.edu				Univ. of Arizona
{allegra,cmcl2,ihnp4,noao}!arizona!andrey	Tucson, AZ 85710
						(602) 621-2858