[comp.ai] Writing style analyzers

dmocsny@uceng.UC.EDU (daniel mocsny) (02/09/89)

In article <44587@linus.UUCP>, bwk@mbunix.mitre.org (Barry W. Kort) writes:
> Well, Dan, I've been using WWB (Writer's WorkBench) ever since the first
> version came out of Murray Hill, and at least some of your vision is
> already a reality.

I have used Rightwriter quite a bit in the PC environment. I don't know
how it compares to WWB (not available on any UN*X boxes that I can access
here), but it is useful. (It does tend to gag on LaTeX markup commands,
though, throwing off the readability index it computes at the end...)

However, Rightwriter doesn't go nearly so far as I would like. It does
spot a few things well, such as passive voice and possibly useless
phrases ("in order to," instead of "to"). It doesn't identify several
of the leading causes of useless sentence complexity. For example,
Rightwriter will accept without complaint either of the two equivalent
sentences:

1. The thermometer measures the temperature.

2. It is the thermometer which is that which serves to accomplish the
measurement of the temperature.

Both humans and computers take longer to "understand" the second
sentence.  Unfortunately, sentences like that are more like the rule
than the exception in the technical literature. Rightwriter does not
detect (1) pronouns that precede their referents, (2) noun phrases
that are equivalent to (simpler) action verbs, or (3) unnecessary
helping verbs. The second sentence displays all three, and yet escapes
with a clean bill of health. (For more examples and advice, see
John Brogan, "Clear Technical Writing," McGraw-Hill, 197(3?).)

When Rightwriter does flag a "complex sentence," it does not attempt
to simplify it, or even give any hints. This is because it probably
does not do any semantic analysis. Modern grammar and style checkers
are useful tools (as useful as spelling checkers, I believe).
However, their full utility won't be evident until they (1) "know" more
about what makes writing unnecessarily complex, and (2) attempt to
"understand" the text they analyze. I suppose the second goal would
require "extracting" the underlying "facts" in a writing sample and
compiling them in some sort of a formal knowledge structure. This
might allow the program to render the stored facts into text with
simpler sentence structure. 

Since such a "logical" approach promises to be difficult, perhaps we
should explore alternatives. Do any subscribers to this newsgroup know
of any attempts to train a neural network to simplify sentences?  John
Brogan's book has many examples of sentence pairs similar to the one I
gave above. After I worked through all his examples, I became
uncannily aware of the sentence structures he advises against. Now my
colleagues give me papers to proofread, and I sail through them with a
red pen.

I have wondered whether a neural network could feasibly learn to
recognize and correct unnecessary sentence complexity after exposure
to such a training set. This should not be difficult to do over a
restricted grammar.

Cheers,

Dan Mocsny
dmocsny@uceng.uc.edu

bwk@mbunix.mitre.org (Barry W. Kort) (02/11/89)

In article <667@uceng.UC.EDU> dmocsny@uceng.UC.EDU (Daniel Mocsny) 
writes about a prose analyzer called Rightwriter.

Just for fun, Dan, I took the text of your posting and ran it through
WWB (Writer's WorkBench).  Here are the condensed results...

    WWB found no spelling errors, no punctuation errors, no split
    infinitives, and no double words. 
    
    WORD CHOICE
    
    Sentences with possibly wordy or misused phrases are listed next,
    followed by suggested revisions.
    
    beginning line 1 Mocsny
    I have used Rightwriter *[ quite]* a bit in the PC environment.
    
    beginning line 6 Mocsny
    It does spot a few things well, such as passive voice and possibly
    useless phrases ( *["in order to,]*" instead of "to").
    
    beginning line 15 Mocsny
    It is the thermometer *[ which ]* is that *[ which ]* serves to
    accomplish the measurement of the temperature.
    
    beginning line 33 Mocsny
    I suppose the second goal would require "extracting" the underlying
    "facts" in a writing sample and compiling them in some
    *[ sort of ]* a formal knowledge structure.
    
    file Mocsny: number of lines 57 number of phrases found 5
    
    -------------------   Table of Substitutions   --------------------
    
    PHRASE                     SUBSTITUTION
    
    in order to: use "to" for " in order to"
    quite: use "OMIT" for " quite"
    sort of: use "somewhat" for " sort of"
    which: use ""that" when clause is restrictive" for " which"
    which: use "of which" for " of that"
    which: use "when" for " at which time"
    
    READABILITY
    
         The Kincaid readability formula predicts that your text
    can be read by someone with 10 or more years of schooling,
    which is a low score for this type of document.
    
         You have an appropriate distribution of sentence types.
    
         You have appropriately limited your use of passives and
    nominalizations (nouns made from verbs, e.g. "description").
    
Although WWB did flag a few things differently than Rightwriter,
they seem to be interchangeable.

--Barry Kort

dmocsny@uceng.UC.EDU (daniel mocsny) (02/14/89)

In article <44802@linus.UUCP>, bwk@mbunix.mitre.org (Barry W. Kort) writes:
> Just for fun, Dan, I took the text of your posting and ran it through
> WWB (Writer's WorkBench).  Here are the condensed results...

I was tempted to run your post through Rightwriter. But first I
checked with my homunculus, and he said he's had enough of infinite
regress for one lifetime. Besides, it would be something like setting
up a dialog between Racter and Eliza.

>     WWB found no spelling errors, no punctuation errors, no split
>     infinitives, and no double words. 

So do I get the job, or what?

I'll admit that before I looked into this grammar-checking stuff, I
was not aware that the split infinitive was something "to really
avoid." :-)

>     READABILITY
>     
>          The Kincaid readability formula predicts that your text
>     can be read by someone with 10 or more years of schooling,
>     which is a low score for this type of document.

Rightwriter reports a similar index. When I first started using
Rightwriter, I was consistently scoring 17--20. Now I get the same
ideas across with 9--11's. Unfortunately, many people may misunderstand
the scoring, and mistakenly think that "writing on a higher grade level"
implies something other than useless semantic overhead. I wish these
tools could also report an "essential complexity index" to show how
complex the underlying factual content really is. The difference
between the Kincaid index and the essential index (when nonzero, and
thus positive) would measure accidental complexity.

Certain documents, such as insurance policies or legal briefs, would
typically have large accidental indices.

>          You have appropriately limited your use of passives and
>     nominalizations (nouns made from verbs, e.g. "description").

Or, e.g. "nominalization," or "use of." (Do as I say...) :-) I make a
game now of trying to avoid passives as much as possible while
simultaneously using "we" only when necessary. Sometimes this forces
me to think very hard about who or what is really doing what to whom
or what. This is the writer's job, not the reader's. Nominalizations
are now a pet peeve of mine. I only use them when I can't think of an
alternative.

> Although WWB did flag a few things differently than Rightwriter,
> they seem to be interchangeable.

WWB seems to do a bit more.

> --Barry Kort

Dan Mocsny
dmocsny@uceng.uc.edu

bwk@mbunix.mitre.org (Barry W. Kort) (02/15/89)

In article <44802@linus.UUCP>, I tendered a Writer's WorkBench
analysis of Dan Mocsny's analysis of Rightwriter.  In article
<688@uceng.UC.EDU> DU Dan rejoins:

 > I was tempted to run your post through Rightwriter. But first I
 > checked with my homunculus, and he said he's had enough of infinite
 > regress for one lifetime. Besides, it would be something like setting
 > up a dialog between Racter and Eliza.

Wouldn't it likely converge to a fixed point?

 > So do I get the job, or what?

Have you sent in your resume?  (Our bureaucracy is pretty stiff
about the paperwork.)

 > I'll admit that before I looked into this grammar-checking stuff, I
 > was not aware that the split infinitive was something "to really
 > avoid." :-)

I, too, learned that one should not use a preposition to end a sentence
with.  (I wish I had the defiance of Churchill, who fumed, "That is
one bit of grammatical nonsense up with which I will not put!)

As to reading-grade level, I now favor the model of Saturday
morning cartoons to reach the greatest audience.

--Barry Kort

"The sad part about is that he's not joking."