[net.sources.bugs] ispell incorrectly handles plurals

geoff@desint.UUCP (02/28/87)

Well, that didn't take as long as I thought it was going to.  As I said
before, disregard Bill Randle's latest bug fix (the one for "activityes").
Instead, apply this one.  It also incorporates the mask typo Bill found.

Problem:  ispell.c has a typo in a mask (0x7 instead of 0x7f).

Problem:  ispell.c issues the egrep command with "-i", instead of leaving it
	  to EGREPCMD in config.h, causing problems if you don't have -i.

Problem:  good.c incorrectly thinks "es" is an acceptable pluralization for
	  just about any word, instead of limiting it to those ending
	  in S, X, Z, or H.

Fix:	  Run this article through patch.  From rn, type "|patch -d dir",
	  where dir is the directory where you have ispell.  From other
	  newsreaders, save it in a file and type "patch -d dir < file".
	  If your egrep has the -i switch, you will also have to
	  modify EGREPCMD in config.h to include the -i switch there.
	  I don't include a patch for that.

Index: good.c

*** good.c.old	Sat Feb 28 01:06:31 1987
--- good.c	Sat Feb 28 01:06:34 1987
***************
*** 392,398
  		return;
  	case 'E': /* S (except simple adding of an S) */
  		p[-2] = 0;	/* drop the ES */
! 		if ((dent = lookup (w, strlen (w))) != NULL) {
  			if (dent->s_flag)
  				wordok = 1;;
  			return;

--- 395,402 -----
  		return;
  	case 'E': /* S (except simple adding of an S) */
  		p[-2] = 0;	/* drop the ES */
! 		if (index ("SXZH", p[-3]) != NULL
! 		    && (dent = lookup (w, strlen (w))) != NULL) {
  			if (dent->s_flag)
  				wordok = 1;;
  			return;


Index: ispell.c

*** ispell.c.old	Sat Feb 28 01:06:52 1987
--- ispell.c	Sat Feb 28 01:06:56 1987
***************
*** 206,212
  		case 'w':
  			num[3] = '\0';
  #ifdef NO8BIT
! 			mask = 0x7;
  #else
  			mask = 0xff;
  #endif

--- 210,216 -----
  		case 'w':
  			num[3] = '\0';
  #ifdef NO8BIT
! 			mask = 0x7f;
  #else
  			mask = 0xff;
  #endif
***************
*** 885,891
  #ifdef LOOK
  		if (wild)
  			/* string has wild card characters */
! 			sprintf (cmd, "%s -i '^%s$' %s", EGREPCMD, grepstr, WORDS);
  		else
  			/* no wild, use look(1) */
  			sprintf (cmd, "/usr/bin/look -df %s %s", grepstr, WORDS);

--- 889,895 -----
  #ifdef LOOK
  		if (wild)
  			/* string has wild card characters */
! 			sprintf (cmd, "%s '^%s$' %s", EGREPCMD, grepstr, WORDS);
  		else
  			/* no wild, use look(1) */
  			sprintf (cmd, "/usr/bin/look -df %s %s", grepstr, WORDS);
***************
*** 890,896
  			/* no wild, use look(1) */
  			sprintf (cmd, "/usr/bin/look -df %s %s", grepstr, WORDS);
  #else
! 		sprintf (cmd, "%s -i '^%s$' %s", EGREPCMD, grepstr, WORDS);
  #endif
  		shellescape (cmd);
  	}

--- 894,900 -----
  			/* no wild, use look(1) */
  			sprintf (cmd, "/usr/bin/look -df %s %s", grepstr, WORDS);
  #else
! 		sprintf (cmd, "%s '^%s$' %s", EGREPCMD, grepstr, WORDS);
  #endif
  		shellescape (cmd);
  	}
-- 

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff

billr@tekred.TEK.COM (Bill Randle) (03/10/87)

Thanks to Geoff for finding the correct qualification to use in the test
in s_ending() in good.c.  This will properly report bad -es endings,
but unfortunately will not suggest a correct alternative.  Also,
some words that are correctly spelled are still reported as bad.
This is a test file that I ran thru ispell:
	***************
This is a ispell test for plurals.  Many activitys have misspelled
words.  It should also catch activityes and hawkes and do something.
This one is correct: activities.
	***************
Ispell as modified with Geoff's changes and without the additional
changes I submitted at the same time shows:

activitys -->
  0. activity

activityes -->

hawkes  -->
  0. ---
  1. ---
  2. hawks
  3. ---

activities -->

Note that the alternative lookup  finds "hawks" because it is a single
letter change (drop the 'e') away from "hawkes".  Although "activities"
is a single letter change (switch 'y' to 'i') from "activityes", the
"-ies" plural ending is not recognized as a good word.

If you merge the changes I suggested with the proper qualification as
suggested by Geoff [the other changes were a few more lines in
s_ending() and a new routine wrongplural() in ispell.c], ispell shows:

activitys -->
  0. activity
  1. activities

activityes -->
  0. activities

hawkes  -->
  0. ---
  1. ---
  2. hawks
  3. ---

[activities does not show as bad]

Now I will be the first to agree that this may be kludegy (sp?) but
it is helpful for people who have problems with -yxx and -ixx endings.
-- 

	-Bill Randle
	Tektronix, Inc.
	billr@tekred.TEK.COM

geoff@desint.UUCP (03/13/87)

In article <1038@tekred.TEK.COM> billr@tekred.UUCP (Bill Randle) writes:

> Thanks to Geoff for finding the correct qualification to use in the test
> in s_ending() in good.c.  This will properly report bad -es endings,
> but unfortunately will not suggest a correct alternative.  Also,
> some words that are correctly spelled are still reported as bad.
...
> Now I will be the first to agree that this may be kludegy (sp?) but
> it is helpful for people who have problems with -yxx and -ixx endings.

Well, Bill is doubly right.  In the first place, my fix wasn't complete
as it stood.  In the second, and much worse, ispell was permeated with
logic errors.  The basic problem is this:  the code was written from
the same specs found in the README file.  Unfortunately, the code must
work in reverse:  given "activities", it must strip off the 'ies', add
the 'y', and look that up.  This can become tricky with some of the
more complex endings.  A common problem in much of the code was with
E-endings:  after stripping off an E and failing, the code would
give up rather than trying other alternatives.

I am just putting the finishing touches on a new version of ispell with
(I hope) all of these bugs corrected.  I have systematically gone through
all of the ending code and verified that it did what the README said, or
fixed it so it does.  I have also hashed the personal dictionary and
added suffixes, integrated everybody else's fixes and improvements, and
(fanfare) written a spelling-list muncher that can take "road" and "roads"
and turn them into "road/s".

I will be posting soon, along with some dictionary diffs to correct some
misspellings that have crept in over the years.  Watch net.sources
for the code, net.sources.bugs for the dictionary diffs.

P.S.  Just to be double-sure, I ran Bill's whole article, including his
test text, through my latest ispell.  It worked like a charm.
-- 

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff

u3369429@murdu.UUCP (03/15/87)

FYI: It's pluralia