prisant@duke.cs.duke.edu (Michael G. Prisant) (12/01/90)
Does a grammar checker exist for documents formatted in TeX or LaTeX? I am currently looking for some sort of style checker to review my documents. Even a utility which offered primitive checking for repeated words would be helpful. Of course a number of grammer checkers exist for Macintosh and PC word processors: these are commercial products. However, I also understand that a style checker exists for nroff/troff documents which analyzes surface characteristics of sentences such as length and use of the passive voice. Apparently a diction utility also exists for nroff/troff files which identifies poorly worded or verbose phrases in a document. The utilities "deroff" the document before analysis and contain features specific to nroff/troff formatting. These utilities must be very close to what is needed for TeX files. I suppose they could be modified or made part of a shell script kludge but it would of course be nicer to have something specifically designed for documents formatted in TeX/LaTeX. I have checked a number of the standard TeX public archives for such a utility but without any luck. Any help which readers of this Newsgroup might offer me in locating even a primitive utility would be most appreciated. May I ask that people respond to me directly via e-mail as well as posting to this group. I will summarize responses that I receive. Michael G. Prisant (919) 684 2830 prisant@romeo.cs.duke.edu Department of Chemistry Duke University Durham NC 27706
prisant@duke.cs.duke.edu (Michael G. Prisant) (12/05/90)
In article <659999439@romeo.cs.duke.edu> prisant@duke.cs.duke.edu (Michael G. Prisant) writes: > >Does a grammar checker exist for documents formatted in TeX or LaTeX? > >I am currently looking for some sort of style checker to review my >documents. Even a utility which offered primitive checking for >repeated words would be helpful. ... > > >Michael G. Prisant (919) 684 2830 >prisant@romeo.cs.duke.edu > >Department of Chemistry >Duke University >Durham NC 27706 A number of readers responded to my posting. Unfortunately, all were seeking a similar grammar checker utility. Perhaps a bashful reader who has solved this problem might yet be coaxed to respond. Here, however, are some aspects of the problem which I now better understand. Basically the problem may be broken down into two parts corresponding to two requires programs. 1) A program to remove TeX and LaTeX code is required as a first step: Most programs which strip TeX/LaTeX code output a stream of words. This is adequate for piping to spell but is insufficient for a grammar checker. Instead the grammar checker requires a program which preserves sentence structure in its output. The code which currently is available includes: a) TeXTools by Kamal Al-Yahya: this is a set of subroutines with a detex utility which preserves sentence structure. On our system this was in the directory /tex/tex82/TeXcontrib/kamal. I tried this routine on my LaTeX documents with the following results. On the plus side, the routine understands \include commands and will branch out through files listed in a template. On the negative side, the routine replaces in-line equations with blanks sometimes resulting in repeated punctuation. It also didn't strip out all commands. Summmary: this "nearly" works. b) DviDoc: this is apparently a DVI previewer for ASCII devices. Apparently this is the only one of the dvi-to-tty previewers which produces output in a sufficiently general appropriate for piping to a grammar checker. I have not been able to locate this utility (perhaps someone could post an ftp archive where it is stored) and hence have not tried it. My information comes from Sanjiv Bhatia at University of Nebraska who also tells me that he has had difficulty compiling it on his machine. Thanks to Sanjiv Bhatia (sanjiv@hoss.unl.edu) for this info 2) A Program to take the filtered output and check it for grammar, punctuation, and diction. Ideally this program should check for repeated words, mispunctuation, clumsy wording, choice of voice, mixture of tense, and improper pluralization. Some of these thing seem relatively easy (repeated words) and others are probably very difficult (tense and number agreement). Perhaps a net reader more knowledgable then myself might post an article reviewing the state of th art in these matters. In any case I have found two program suites which perform some of these functions. a) Style and Diction: these seem to be part of the BSD4.3 distribution. Style produces statistics on a document such as the distribution of sentence lengths, number of verbs, etc. Diction identifies poor phrasing in a document and has a second utility to make substitution suggestions. Both utilties function on text which has been properly "deroffed" and then piped through a filter. I found source for these utilties and compiled them on my machine (SunOS 4.xx/Sparc) without much difficulty. I then applied the utilities to the most plentiful supply of nroff documents available -- the man pages! This may be a matter of taste, but I do not think either utility would be a practical aid to my writing. Style is very quantitative in its output: the whole document is reduced to a bunch of numbers. Knowing the number of verbs in my document or the average sentence length doesn't really help me that much. Interaction with the Diction and Explain utilities remind me a bit of Elisa. How many times can you substitute "show" for "indicate"? In any case this checker did not identify common typographical or punctuation errors but only produced a list of stock phrases. b) Proofr: this is part of an ATT package called the writer's workbench (wwb). It is apparently not free. However, it seems to do a large number of useful things. Here is the first sentence of a help file from a system which has installed the package: "Writer's Workbench (WWB) is a collection of programs that do proofreading and stylistic analysis of text files, together with checks for spelling, punctuation, diction, and doubled words." Sounds good! But I don't know how much this costs and haven't been able to try it. Of course, this utility will also require text which has been has had all TeX/LaTeX typesetting commands properly removed. Conclusions and Summary: Grammer checking requires first that TeX/LaTeX typesetting commands be removed from the manuscript without disturbing its sentence structure or creating additional punctuation errors. I have not located a public domain "black box" to perform this task which is really trouble free. A stripping filter is a prerequisite for piping to a grammar checker so this is a problem. The Unix utilities Style and Diction widely available as part of the BSD4.3 distribution are really not very useful in checking text. They are no help at all with the most common typographical errors of doubled words and misplaced punctuation. The ATT Writers Work Bench appears to do many useful things but it is not widely available because it is not free. Perhaps others will turn up more on this subject. Michael Prisant Department of Chemistry Duke University Durham, NC 27706