[comp.text] Ispell bug report and suggested solution

wstomv@wsinpa01.win.tue.nl (Tom Verhoeff) (06/12/91)

This is a bug report for Ispell Version 2.0.02, May 1987 Beta posting
(an interactive spelling checker/corrector for Unix).  NOTE:

	I have not been able to find a more recent version.
	From the doc files that come with ispell it is not clear
	who is/feels responsible for ispell's maintenance.

In the man page for ispell (file ispell.1) we read:

	The
	.B \-t
	option selects TeX/LaTeX input mode.
	TeX/LaTeX mode is also automatically selected if an input file has
	the extension ".tex".
	In this mode, whenever a backslash ("\e") is found,
	.I ispell
	will skip to the next whitespace.
	Thus, for example, given
	.RS
	\echapter {This is a Ckapter}
	\ecite{SCH86}
	.RE
	will find "Ckapter" but will not look for SCH.

`Selective ignoring' is a useful feature to have.  Including special
dictionary entries for such nonsense words as SCH or eqnarray is not
a good solution.

In ispell.c we find, however, the following implementation:

	#define ISTEXTERM(c)   (((c) == '{') || \
				((c) == '}') || \
				((c) == '[') || \
				((c) == ']'))

	/* ... stuff deleted ... */

	    if (tflag)		/* TeX or LaTeX stuff */
	    {
		if (*currentchar == '\\') {
		    /* skip till whitespace */
		    while (*currentchar && 
			(!isspace(*currentchar) &&
			 !ISTEXTERM(*currentchar))) {
			    if (!lflag)
				putc(*currentchar, outfile);
			    currentchar++;
			}
		    continue;
		}
	    }

Apparently, after a backslash, characters are skipped until either
end-of-file is reached, or a whitespace is encoutered, OR
one of {}[] is encoutered.  The latter category of characters
is not in agreement with the man page and it ruins the intention
of the man page.  For example, in \cite{SCH86}, SCH is flagged as a
spelling error.  There may have been a reason for this additional
condition to terminate the loop, but I don't see one.  Maybe it was
a concern for such words as `r\^{o}le'?

The point is that when spell-checking a (la)tex document, there
are TWO categories of character sequences to ignore:
	(1) ALL the names of commands (these START with a backslash (\));
	(2) SOME of the arguments to commands.
The current implementation only covers category (1).  Apparently, the END
of a command name is assumed to be either end-of-file, white space, or one
of {}[].  Backslash may be ignored as end, because it would again
result in skipping until end-of-command.  Category (2) requires a
convention to indicate to ispell which arguments to check and which not.
In the man page, it is suggested that this convention is:
	arguments separated from the command by white space are checked,
	others are not.
This is a practically acceptable convention.
There is one minor problem: it does not allow you to indicate that an
argument consisting of more than one word should be ignored.  For example,
in \section{Thiz Ckapter}, the error in Thiz will be ignored, but
Ckapter will be flagged.  This is not a real problem, since multiple-word
arguments are always (well, almost always) normal text.

My suggestion is to remove the condition !ISTEXTERM from the guard of
that while statement.  For instance, by defining ISTEXTERM as FALSE (i.e., 0).

Any comments?

	Tom
--
DOMAIN: wstomv@win.tue.nl    /    Eindhoven University of Technology
VOICE: +31 40 47 41 25      /    Dept of Mathematics & Computing Science
FAX: +31 40 43 66 85       /    PO Box 513, NL-5600 MB Eindhoven, Netherlands
-- 
DOMAIN: wstomv@win.tue.nl    /    Eindhoven University of Technology
VOICE: +31 40 47 41 25      /    Dept of Mathematics & Computing Science
FAX: +31 40 43 66 85       /    PO Box 513, NL-5600 MB Eindhoven, Netherlands