[net.internat] Hyphenation, Re: Why Hyphenate

ray@othervax.UUCP (Raymond D. Dunn) (11/29/85)

(Double posted to net.text as this is where the discussion probably
belongs - follow-ups to there)

Having worked for seven years for a developer/manufacturer of
typesetting and other equipment for both the newspaper and general
graphics arts industries, I would like to add my two cents worth.

It is interesting to note that the graphic *arts* industry is one
which has retained the concepts of style and attention to detail, and
has laudably forgone the all too commonly seen solution of making do
with what automation can provide "easily".

Instead, it has continuously forced the typography equipment
manufacturers to meet their stringent subjective standards of what is
"right" and what is "wrong" in typeset material.  This includes some
exceedingly hard to implement requirements which gave (to
non-insiders) very marginal improvements in "quality".

(An interesting aside, even these standards were not enough for
Knuth, who set off on his (excellent) Tex and Metafont tangent
because of his dissatisfaction with the typesetting of his Life's
Work.  This contains much "scientific" content, a particularly
difficult typography task.  It's only a pity that he chose a
traditional embedded command approach to the typesetting problem,
rather than something more interactive and immediate).

Newspaper production should be disassociated from any serious
discussion about hyphenation, style etc.  Newspapers work to
different rules - the papers must hit the streets.  If several
consecutive lines contain hyphenations, paragraphs contain massive
rivers, or there is more white-space in a line than text - WHO CARES
(they dont)!  However, what an opportunity, if we provide adequate
tools, newspapers may become readable (:-)!

Hyphenation is generally (correctly) regarded as a "Bad Thing".
Unfortunately, it is necessary when meeting the other (subjectively
more important) objectives of layout and style.  These in general
conform to the rule that, when glancing at a typeset page or
paragraph, one's eyes should not be drawn automatically to any place
not specifically intended by the typographer.  In general, although
specific parts of the text may be harder to read, a "noisy" page is
regarded as being more difficult to read overall, than a "quiet" one.

Any arguments in this context, for and against hyphenation in
general, and concering justification/ragged-right, are specious.
They fall into the category of "I like/hate Picasso".  Certainly
there is room for other styles, and we must provide technological
solutions for *all* of them.

Traditionally, hyphenation has been implemented by algorithm, with an
associated exception-word-dictionary. This was the case *only*
because it was impractical to store and access a full dictionary.

It *IS NOT* possible to implement acceptable hyphenation solely by
algorithm (in English certainly).  There are many classical examples,
the one that immediately comes to mind is "therapist", "the-
rapist" (I hope this is not Freudian).  If your pet algorithm can
handle this one, then there will be other examples on which it too
will fail.

It *IS* by definition possible to implement hyphenation solely by
dictionary.  If the dictionary is large enough, the assumption that a
word is non-hyphenable if it does not appear there is perfectly
acceptable.  As has already been pointed out in previous articles, a
dictionary can easily be structured to handle all the "peculiars",
like hyphenation also causing a word to change its spelling (this was
news to me).

Now to get the arguments rolling (:-) :

It is almost certain that as the use of What-You-See-Is-What-You-Get
systems increase, as storage costs go down, and *SPELLING CORRECTION
DICTIONARIES* become the norm on text manipulation systems,
hyphenation *WILL* be done automatically solely by (that) dictionary.

Tex, and the current UNIX tools for typeset text preparation, are
rapidly becoming dinosaurs - they probably have already become so.
Visible typography commands embedded in text, and separate H & J/page
makeup runs are passe (see - we need an extended character set even
for English (:-)), even if we have a "soft typesetter" screen to see
the results before we commit the text to the typesetter/printer.

You cannot expect the "average" user to struggle with an embedded
typesetting langauge in which (s)he has to go through a mental
mapping process from ad-hoc command to spacial effect, and this user
will increasingly demand full typographic features as (s)he fully
realises the capabilties of laser printers.

WYSIWYG systems (with the associated demise of much of the graphic
arts industry) are becoming increasingly practical and popular, from
Interleave to the good old "Mac".  The drop in price of both quality
laser printers, RAM, and the obvious need to manipulate text and
graphics together (both pictures and line drawings), can only speed
up this trend.

For the doubters, even within the traditional graphics arts industry
WYSIWYG systems were always regarded as the favoured solution.  They
have been around for at least 10 years in specific applications like
display-ad make-up, and were only limited by their lack of
appropriate cost effective technology (both hardware and software).


Ray Dunn.   ..philabs!micomvax!othervax!ray

Disclaimer: The above opinions are my own, for what they are worth,
            and I have no direct connection with the current graphics
            arts industry.