[net.sources.bugs] ispell bug w/ fix + enhancement

billr@tekred.UUCP (02/25/87)

I recently found an area where ispell has a problem.  This occurs
when a word ending in 'y' is incorrectly pluralized.  For example,
the word "activitys" is caught by ispell as incorrect, but the two
alternate spellings listed are "activity" and "activityes", the later
being just plain wrong.  What I did was add a check in the routine
that finds possibilities to try replacing "ys" with "ies" and checking
if this is a correctly spelled word.  In addition, I added some code
in good.c that checks for "yes" endings and reports them as misspelled.
Finally, when looking at the code, I found what appears to be a typo
for the '-w' option where mask is set to 0x7 instead of 0x7f.  Context
diffs for ispell.c and good.c follow.

	-Bill Randle
	Tektronix, Inc.
	billr@tekred.TEK.COM

	--------------------------------------
*** ispell.c.old	Thu Jan 29 17:20:50 1987
--- ispell.c	Tue Feb 24 10:54:04 1987
***************
*** 12,17 ****
--- 12,21 ----
   *	-p option & WORDLIST variable for alternate personal dictionary
   *	-x option to suppress .bak files.
   *	8 bit text & config.h parameters
+  *
+  * 2/24/87, Bill Randle added:
+  *	routine to check for bad pluralization (i.e. "...ys" when it
+  *	   should be "...ies" and vice versa.
   */
  
  #include <stdio.h>
***************
*** 180,186 ****
  		case 'w':
  			num[3] = '\0';
  #ifdef NO8BIT
! 			mask = 0x7;
  #else
  			mask = 0xff;
  #endif
--- 184,190 ----
  		case 'w':
  			num[3] = '\0';
  #ifdef NO8BIT
! 			mask = 0x7f;
  #else
  			mask = 0xff;
  #endif
***************
*** 577,582 ****
--- 581,587 ----
  		possibilities[i][0] = 0;
  	pcount = 0;
  
+ 	if (pcount < 10) wrongplural (word);
  	if (pcount < 10) wrongletter (word);
  	if (pcount < 10) extraletter (word);
  	if (pcount < 10) missingletter (word);
***************
*** 620,625 ****
--- 625,658 ----
  			}
  		}
  		newword[i] = word[i];
+ 	}
+ }
+ 
+ wrongplural (word)
+ char word[];
+ {
+ 	int n;
+ 	char newword[BUFSIZ], *p;
+ 
+ 	n = strlen (word) - 1;
+ 	if (word[n] != 'S' && word[n] != 's')
+ 		/* no trailing 's' */
+ 		return;
+ 
+ 	strcpy (newword, word);
+ 	p = newword + n;
+ 	p--;	/* next to last letter */
+ 
+ 	if (*p == 'Y' || *p == 'y') {
+ 		/* try replacing 'Y' with 'IE' */
+ 		*p++ = 'I';
+ 		*p++ = 'E';
+ 		*p++ = 'S';
+ 		*p = 0;
+ 		if (good (newword)) {
+ 			if (insert (cap (newword, word)) < 0)
+ 				return;
+ 		}
  	}
  }
  
	-----------------------------------
*** good.c.old	Thu Jan 29 17:20:49 1987
--- good.c	Tue Feb 24 11:21:21 1987
***************
*** 377,382 ****
--- 377,391 ----
  		return;
  	case 'E': /* S (except simple adding of an S) */
  		p[-2] = 0;	/* drop the ES */
+ 		if (p[-3] == 'Y')
+ 			/*
+ 			 * There are just a few good words that end
+ 			 * in YES, so it's better to declare it illegal
+ 			 * and make the user double check using the
+ 			 * 'L' command than to call it legal and be
+ 			 * wrong.   billr@tekred.tek.com  2/24/87
+ 			 */
+ 			return;
  		if ((dent = lookup (w, strlen (w))) != NULL) {
  			if (dent->s_flag)
  				wordok = 1;;
-- 

	-Bill Randle
	Tektronix, Inc.
	billr@tekred.TEK.COM

geoff@desint.UUCP (02/28/87)

In article <1008@tekred.TEK.COM> billr@tekred.TEK.COM (Bill Randle) writes:

> I recently found an area where ispell has a problem.  This occurs
> when a word ending in 'y' is incorrectly pluralized...
> ...In addition, I added some code
> in good.c that checks for "yes" endings and reports them as misspelled.

Unfortunately, this fix is insufficient, kludgey, and wrong.  For example,
the s_ending routine (which is where the problem lies) will also accept
"hawkes" for "hawks".

I am working on a correct fix, which I will post when I have tested it.
In the meantime, I recommend that people save the original before
applying Bill's fixes, since I'm not going to put in a routine that
arbitrarily rejects a certain letter sequence regardless of what's in
the dictionary.
-- 

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff