[comp.text.tex] Grammar Checker for TeX files

prisant@duke.cs.duke.edu (Michael G. Prisant) (12/01/90)

Does a grammar checker exist for documents formatted in TeX or LaTeX?

I am currently looking for some sort of style checker to review my
documents.  Even a utility which offered primitive checking for
repeated words would be helpful.

Of course a number of grammer checkers exist for Macintosh 
and PC word processors:  these are commercial products.  However, I
also understand that a style checker exists for nroff/troff documents 
which analyzes surface characteristics of sentences such as length 
and use of the passive voice. Apparently a diction utility also exists
for nroff/troff files which identifies poorly worded or verbose
phrases in a document.  The utilities "deroff" the document before
analysis and contain features specific to nroff/troff formatting.
These utilities must be very close to what is needed for TeX files.
I suppose they could be modified or made part of a shell script kludge
but it would of course be nicer to
have something specifically designed for documents formatted
in TeX/LaTeX. 

I have checked a number of the standard TeX public archives for such
a utility but without any luck.

Any help which readers of this Newsgroup might offer me in locating
even a primitive utility would be most appreciated.  May I ask
that people respond to me directly via e-mail as well as posting
to this group.  I will summarize responses that I receive.

Michael G. Prisant (919) 684 2830
prisant@romeo.cs.duke.edu

Department of Chemistry
Duke University
Durham NC 27706

prisant@duke.cs.duke.edu (Michael G. Prisant) (12/05/90)

In article <659999439@romeo.cs.duke.edu> prisant@duke.cs.duke.edu (Michael G. Prisant) writes:
>
>Does a grammar checker exist for documents formatted in TeX or LaTeX?
>
>I am currently looking for some sort of style checker to review my
>documents.  Even a utility which offered primitive checking for
>repeated words would be helpful. ...
>
>
>Michael G. Prisant (919) 684 2830
>prisant@romeo.cs.duke.edu
>
>Department of Chemistry
>Duke University
>Durham NC 27706

A number of readers responded to my posting. Unfortunately, all were seeking
a similar grammar checker utility.  Perhaps a bashful reader who has
solved this problem might yet be coaxed to respond.

Here, however, are some aspects of the problem which I now better
understand. Basically the problem may be broken down into two parts
corresponding to two requires programs.

1) A program to remove TeX and LaTeX code is required as a first step:

Most programs which strip TeX/LaTeX code output a stream of words.
This is adequate for piping to spell but is insufficient for a
grammar checker.

Instead the grammar checker requires a program which preserves sentence
structure in its output.  The code which currently is available
includes:

  a)	TeXTools by Kamal Al-Yahya:  this is a set of subroutines
	with a detex utility which preserves sentence structure.
	On our system this was in the directory /tex/tex82/TeXcontrib/kamal.

	I tried this routine on my LaTeX documents with the following
	results. On the plus side, the routine understands \include
	commands and will branch out through files listed in a template.  
	On the negative side, the routine replaces in-line equations
	with blanks sometimes resulting in repeated punctuation.
	It also didn't strip out all commands.

	Summmary:  this "nearly" works.
	
   b)   DviDoc: this is apparently a DVI previewer for ASCII devices.
	Apparently this is the only one of the dvi-to-tty previewers
	which produces output in a sufficiently general appropriate 
	for piping to a grammar checker.

	I have not been able to locate this utility (perhaps someone
	could post an ftp archive where it is stored) and hence
	have not tried it.  My information comes from Sanjiv Bhatia
	at University of Nebraska who also tells me that he has had
	difficulty compiling it on his machine.

	Thanks to Sanjiv Bhatia (sanjiv@hoss.unl.edu) for this info

2) A Program to take the filtered output and check it for grammar, 
   punctuation, and diction.

Ideally this program should check for repeated words, mispunctuation,
clumsy wording, choice of voice, mixture of tense, and improper 
pluralization.  Some of these thing seem relatively easy (repeated words) 
and others are probably very difficult (tense and number agreement).  
Perhaps a net reader more knowledgable then myself might post an article
reviewing the state of th art in these matters.

In any case I have found two program suites which perform some of these
functions.

    a)	Style and Diction: these seem to be part of the BSD4.3
	distribution.  Style produces statistics on a document
	such as the distribution of sentence lengths, number of
	verbs, etc.  Diction identifies poor phrasing in a document
	and has a second utility to make substitution suggestions.
	Both utilties function on text which has been properly
	"deroffed" and then piped through a filter.

	I found source for these utilties and compiled them on my
	machine (SunOS 4.xx/Sparc) without much difficulty.
	I then applied the utilities to the most plentiful
	supply of nroff documents available -- the man pages!

	This may be a matter of taste, but I do not think either
	utility would be a practical aid to my writing.  Style is
	very quantitative in its output: the whole document is
	reduced to a bunch of numbers. Knowing the number of verbs
	in my document or the average sentence length doesn't really
	help me that much.  Interaction with the Diction and Explain
	utilities remind me a bit of Elisa.  How many times can
	you substitute "show" for "indicate"?  In any case this
	checker did not identify common typographical or punctuation
	errors but only produced a list of stock phrases.

     b) Proofr:  this is part of an ATT package called the writer's
	workbench (wwb).  It is apparently not free.  However, it
	seems to do a large number of useful things.  Here is the
	first sentence of a help file from a system which has installed 
	the package:

     "Writer's Workbench (WWB) is a  collection  of  programs  that  do
     proofreading  and stylistic analysis of text files, together with
     checks for spelling, punctuation, diction, and doubled words."

	Sounds good! But I don't know how much this costs and haven't
	been able to try it. Of course, this utility will also require
	text which has been has had all TeX/LaTeX typesetting commands
	properly removed.

Conclusions and Summary:  

		Grammer checking requires first that TeX/LaTeX
typesetting commands be removed from the manuscript without
disturbing its sentence structure or creating additional punctuation
errors.  I have not located a public domain "black box" to perform
this task which is really trouble free. A stripping filter is
a prerequisite for piping to a grammar checker so this is a problem.

		The Unix utilities Style and Diction widely available
as part of the BSD4.3 distribution are really not very useful in
checking text.  They are no help at all with the most common
typographical errors of doubled words and misplaced punctuation.

		The ATT Writers Work Bench appears to do many useful
things but it is not widely available because it is not free.

		Perhaps others will turn up more on this subject.

Michael Prisant
Department of Chemistry
Duke University
Durham, NC 27706