[net.text] Hyphenation, Re: Why Hyphenate

ray@othervax.UUCP (Raymond D. Dunn) (11/29/85)

(Double posted to net.text as this is where the discussion probably
belongs - follow-ups to there)

Having worked for seven years for a developer/manufacturer of
typesetting and other equipment for both the newspaper and general
graphics arts industries, I would like to add my two cents worth.

It is interesting to note that the graphic *arts* industry is one
which has retained the concepts of style and attention to detail, and
has laudably forgone the all too commonly seen solution of making do
with what automation can provide "easily".

Instead, it has continuously forced the typography equipment
manufacturers to meet their stringent subjective standards of what is
"right" and what is "wrong" in typeset material.  This includes some
exceedingly hard to implement requirements which gave (to
non-insiders) very marginal improvements in "quality".

(An interesting aside, even these standards were not enough for
Knuth, who set off on his (excellent) Tex and Metafont tangent
because of his dissatisfaction with the typesetting of his Life's
Work.  This contains much "scientific" content, a particularly
difficult typography task.  It's only a pity that he chose a
traditional embedded command approach to the typesetting problem,
rather than something more interactive and immediate).

Newspaper production should be disassociated from any serious
discussion about hyphenation, style etc.  Newspapers work to
different rules - the papers must hit the streets.  If several
consecutive lines contain hyphenations, paragraphs contain massive
rivers, or there is more white-space in a line than text - WHO CARES
(they dont)!  However, what an opportunity, if we provide adequate
tools, newspapers may become readable (:-)!

Hyphenation is generally (correctly) regarded as a "Bad Thing".
Unfortunately, it is necessary when meeting the other (subjectively
more important) objectives of layout and style.  These in general
conform to the rule that, when glancing at a typeset page or
paragraph, one's eyes should not be drawn automatically to any place
not specifically intended by the typographer.  In general, although
specific parts of the text may be harder to read, a "noisy" page is
regarded as being more difficult to read overall, than a "quiet" one.

Any arguments in this context, for and against hyphenation in
general, and concering justification/ragged-right, are specious.
They fall into the category of "I like/hate Picasso".  Certainly
there is room for other styles, and we must provide technological
solutions for *all* of them.

Traditionally, hyphenation has been implemented by algorithm, with an
associated exception-word-dictionary. This was the case *only*
because it was impractical to store and access a full dictionary.

It *IS NOT* possible to implement acceptable hyphenation solely by
algorithm (in English certainly).  There are many classical examples,
the one that immediately comes to mind is "therapist", "the-
rapist" (I hope this is not Freudian).  If your pet algorithm can
handle this one, then there will be other examples on which it too
will fail.

It *IS* by definition possible to implement hyphenation solely by
dictionary.  If the dictionary is large enough, the assumption that a
word is non-hyphenable if it does not appear there is perfectly
acceptable.  As has already been pointed out in previous articles, a
dictionary can easily be structured to handle all the "peculiars",
like hyphenation also causing a word to change its spelling (this was
news to me).

Now to get the arguments rolling (:-) :

It is almost certain that as the use of What-You-See-Is-What-You-Get
systems increase, as storage costs go down, and *SPELLING CORRECTION
DICTIONARIES* become the norm on text manipulation systems,
hyphenation *WILL* be done automatically solely by (that) dictionary.

Tex, and the current UNIX tools for typeset text preparation, are
rapidly becoming dinosaurs - they probably have already become so.
Visible typography commands embedded in text, and separate H & J/page
makeup runs are passe (see - we need an extended character set even
for English (:-)), even if we have a "soft typesetter" screen to see
the results before we commit the text to the typesetter/printer.

You cannot expect the "average" user to struggle with an embedded
typesetting langauge in which (s)he has to go through a mental
mapping process from ad-hoc command to spacial effect, and this user
will increasingly demand full typographic features as (s)he fully
realises the capabilties of laser printers.

WYSIWYG systems (with the associated demise of much of the graphic
arts industry) are becoming increasingly practical and popular, from
Interleave to the good old "Mac".  The drop in price of both quality
laser printers, RAM, and the obvious need to manipulate text and
graphics together (both pictures and line drawings), can only speed
up this trend.

For the doubters, even within the traditional graphics arts industry
WYSIWYG systems were always regarded as the favoured solution.  They
have been around for at least 10 years in specific applications like
display-ad make-up, and were only limited by their lack of
appropriate cost effective technology (both hardware and software).


Ray Dunn.   ..philabs!micomvax!othervax!ray

Disclaimer: The above opinions are my own, for what they are worth,
            and I have no direct connection with the current graphics
            arts industry.

zben@umd5.UUCP (12/01/85)

In article <731@othervax.UUCP> ray@othervax.UUCP (Raymond D. Dunn) writes:

>It *IS* by definition possible to implement hyphenation solely by
>dictionary.  If the dictionary is large enough, the assumption that a
>word is non-hyphenable if it does not appear there is perfectly
>acceptable.  As has already been pointed out in previous articles, a
>dictionary can easily be structured to handle all the "peculiars",
>like hyphenation also causing a word to change its spelling (this was
>news to me).

Oh really.  What then, pray tell, would your dictionary entry for the word
"record" contain?  When used as a verb ("to record the data") it should be
"re-cord", but when used as a noun ("give me the record") it should be
"rec-ord" (assuming one hyphenates at syllables, anyway)...

Oh, I guess you'd leave that word out...  :-)

>WYSIWYG systems (with the associated demise of much of the graphic
>arts industry) are becoming increasingly practical and popular, from
>Interleave to the good old "Mac".  The drop in price of both quality
>laser printers, RAM, and the obvious need to manipulate text and
>graphics together (both pictures and line drawings), can only speed
>up this trend.

WYSIWYG systems have their proponents and their uses.  They are VERY good
for novice users, and given the way this field is growing I should think
that "novice users" are going to be the MAJORITY of users until the entire
society is computer literate.  (This much like "automobile literate" was
the thing to be when I was a teenager - something 19 year old males can be
macho about...)

However, there are times when the WYSIWYG paradigm breaks down badly.  As
a somewhat strained analogy, a strict WYSIWYG system might have you use a
mouse to pick out letters from a menu, rather than using a conventional
keyboard.  This would be easier for the "novice user" than learning to type,
but would ultimitely limit data-entry rates to values far below those
attainable by a practiced keyboard operator...

Admittedly a strained example, but take a look around for such pathological
cases the next time you study a WYSIWYG system...

-- 
Ben Cranston  ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben  zben@umd2.umd.edu

dennis@utecfc.UUCP (Dennis Ferguson) (12/02/85)

In article <731@othervax.UUCP> ray@othervax.UUCP (Raymond D. Dunn) writes:
>Having worked for seven years for a developer/manufacturer of
>typesetting and other equipment for both the newspaper and general
>graphics arts industries, I would like to add my two cents worth.
>
>It is interesting to note that the graphic *arts* industry is one
>which has retained the concepts of style and attention to detail, and
>has laudably forgone the all too commonly seen solution of making do
>with what automation can provide "easily".
...
>Hyphenation is generally (correctly) regarded as a "Bad Thing".
>Unfortunately, it is necessary when meeting the other (subjectively
>more important) objectives of layout and style.  These in general
>conform to the rule that, when glancing at a typeset page or
>paragraph, one's eyes should not be drawn automatically to any place
>not specifically intended by the typographer.  In general, although
>specific parts of the text may be harder to read, a "noisy" page is
>regarded as being more difficult to read overall, than a "quiet" one.
>
>Any arguments in this context, for and against hyphenation in
>general, and concering justification/ragged-right, are specious.
>They fall into the category of "I like/hate Picasso".  Certainly
>there is room for other styles, and we must provide technological
>solutions for *all* of them.

If this is true, I find the divergence of the `subjective' opinion of
the graphics arts industry concerning what looks prettier on the page
with the objectively-established opinion of the scientific community
concerning what is easier to read quite interesting.

I spent several years working in a psychology lab for a professor whose
research interests included the acquisition of written language.  Our
own work, which involved the evaluation of readability of text by the
analysis of eye movement data, concurred with the great body of existing
experimental measurements of such things as understanding, retention
and speed of reading of written language in showing that text was most
easily and efficiently read when it was unhyphenated and unleaded, with
a ragged right.  In fact, during the period I worked there, the professor
was involved with the organization of a conference devoted to the topic.
The proceedings, which he editted, were typeset entirely in this form, with
the right ragged.

While my memory is dim, I recall that the original reason for right
justification was technical.  Early printing presses, the kind with actual
lead type, required that the text be set in a square block to keep even
pressure over the paper to prevent slippage of the paper and consequent
smearing of the right-hand ends of long lines.  While the technical reasons
for right justification have long since disappeared, I guess old habits die
hard.
---
				    Dennis Ferguson
				    ...!{decvax,ihnp4}!utcsri!utecfc!dennis

henry@utzoo.UUCP (Henry Spencer) (12/04/85)

> ...  It's only a pity that [Knuth] chose a
> traditional embedded command approach to the typesetting problem,
> rather than something more interactive and immediate...

He really didn't have any choice, since he probably didn't feel like spending
$50k or so (remember this was some years ago) for the sort of equipment he'd
need to build something more interactive and immediate.  He probably also
felt that it would be nice if what he did were usable from an ordinary ASCII
terminal, so that it could be used by the masses instead of just the lucky
few.  (Even today, most of us still work on ASCII terminals.)

A contributing consideration may have been the desire to produce documents
that could be compiled for different output devices without needing manual
reworking.  This implies that the document must be specified in fairly
abstract ways, not in terms of exactly how it looks.  It is possible to
combine this kind of high-level document specification with interactive
immediacy, but it is harder.  Note that Knuth works hard to do things like
"hyphenating" equations well automatically, to avoid manual tuning even in
that fairly-extreme case.  (And you thought hyphenating English was bad...)
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

ray@othervax.UUCP (Raymond D. Dunn) (12/04/85)

In article <803@umd5.UUCP> zben@umd5.UUCP (Ben Cranston) responds to my
earlier posting:

>> It *IS* by definition possible to implement hyphenation solely by
>> dictionary.  If the dictionary is large enough, the assumption that a
>> word is non-hyphenable if it does not appear there is perfectly
>> acceptable...                ^^^^^^^^^^^^^^^
            [I should have said "does not contain any hyphenation points"]

> Oh really.  What then, pray tell, would your dictionary entry for the word
> "record" contain?  When used as a verb ("to record the data") it should be
> "re-cord", but when used as a noun ("give me the record") it should be
> "rec-ord" (assuming one hyphenates at syllables, anyway)...
>
> Oh, I guess you'd leave that word out...  :-)

To be fair, I missed examples of this type (even if I don't
necessarily agree with your hyphenation of "rec-ord").

However this does *not* contradict the dictionary argument, in fact
it enhances it.

Assuming a parser was used to determine the part of speech of a word,
no practical hyphenation algorithm could be devised to hyphenate
words accordingly.  The dictionary of course *could* easily be
constructed to contain different hyphenation points for different
uses of a word when necessary.

>> WYSIWYG systems (with the associated demise of much of the graphic
>> arts industry) are becoming increasingly practical and popular, from
>> Interleave to the good old "Mac"...

> WYSIWYG systems have their proponents and their uses.  They are VERY good
> for novice users, and given the way this field is growing I should think
> that "novice users" are going to be the MAJORITY of users until the entire
> society is computer literate.  (This much like "automobile literate" was
> the thing to be when I was a teenager - something 19 year old males can be
> macho about...)

By your definition then, the majority of automobile users are
novices, and always will be.  Their literacy does not extend further
than the use of five or six controls.  There is no desire in the
majority, nor *need*, to become "automobile literate" in your sense,
the user interface has been designed that way.  The same argument
applies to computer systems.

> However, there are times when the WYSIWYG paradigm breaks down badly.  As
> a somewhat strained analogy, a strict WYSIWYG system might have you use a
> mouse to pick out letters from a menu, rather than using a conventional
> keyboard.....

Not just a strained analogy, totally irrelevant.  Its like saying "a
keyboard *might* have just one key which you hit repeatedly until the
character of choice appears, thus any computer system which uses a
keyboard is ...".  

We are discussing WYSIWYG systems, and the ability of the general
user to do typesetting, not the pros and cons "of mice over
keyboards" (gosh there's a title for a paper (:-)).  WYSIWYG implies
an *approach*, not necessarily a specific user interface.

>Admittedly a strained example, but take a look around for such pathological
>cases the next time you study a WYSIWYG system...

It is difficult enough dealing with the real pathological cases
without trying to handle imaginary ones!

OK, so you're trying to make a point on "efficiency".  Good, that's
what I'm doing as well.  With complex tasks like typesetting, to
reduce this to a measure of keystroke counts is absurd.

The use of a traditional typesetting system requires much dedication
and training, and the ability to visualise the mapping from embedded
commands to the resulting typeset page.  (Even with an expert,
several trial runs on the hardcopy typesetter, or to a "soft" screen,
are often required before the desired effect is achieved).

Many people do not, and can never, have this ability, nor should they
be *required* to train themselves for tasks ancilliary to their
mainstream interest.  They didn't in the past, they turned to an
"expert" (and paid him big bucks).  They shouldn't have to now, they
turn to a computer.  Their literacy need only be how to drive the
thing in a natural way to them, not to be able to manipulate the
nuts and bolts.

*That* is efficiency!

A last point.  Compare this area of expertise with what has happened
in the computerisation of other disciplines (spreadsheets, data
managers, report generators, and the birth of the prime example,
expert systems).

Ray Dunn.  ..philabs!micomvax!othervax!ray

chris@umcp-cs.UUCP (Chris Torek) (12/05/85)

In artcile <46@utecfc.UUCP> dennis@utecfc.UUCP (Dennis Ferguson) writes:

>In article <731@othervax.UUCP> ray@othervax.UUCP (Raymond D. Dunn) writes:
>>... Any arguments in this context, for and against hyphenation in
>>general, and concering justification/ragged-right, are specious.
>>They fall into the category of "I like/hate Picasso".  Certainly
>>there is room for other styles, and we must provide technological
>>solutions for *all* of them.

This is important!  Back to dennis@utecfc.UUCP:

>If this is true, I find the divergence of the `subjective' opinion of
>the graphics arts industry concerning what looks prettier on the page
>with the objectively-established opinion of the scientific community
>concerning what is easier to read quite interesting. ...

>[our work] concurred with the great body of existing experimental
>measurements of such things as understanding, retention and speed
>of reading of written language in showing that text was most easily
>and efficiently read when it was unhyphenated and unleaded, with
>a ragged right. ...

I will assume these measurements have been made with existing
typographics; or if not, that you were careful to bring in the
graphics arts folks first.  Done wrong, right justification seems
to me much worse than ragged right.  Even if you did your own
typesetting, this is still a lesser point:

>While the technical reasons for right justification have long since
>disappeared, I guess old habits die hard.

*This* is important.  Old habits do die hard; yet they are not only
on the part of the typesetters, but also on that of the readers.
As an anecdotal example, I recently bought a collection of Twain's
writings.  It is set ragged-right, unleaded, and unhyphenated.  I
find that the right margin keeps bothering me.  But of course I
have been `conditioned' to expect a flush right margin in typeset
text.

But that I have been `conditioned' does not mean that I am in the
wrong, and that all text should forevermore be printed ragged-right!

There is room for many styles, and we must provide technological
solutions for *all* of them.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

zben@umd5.UUCP (12/08/85)

In article <733@othervax.UUCP> ray@othervax.UUCP (Raymond D. Dunn) responds
to my perhaps too-hastily posted flame:

>In article <803@umd5.UUCP> zben@umd5.UUCP (Ben Cranston) responds to my
>earlier posting:

>>> It *IS* by definition possible to implement hyphenation solely by
>>> dictionary.  If the dictionary is large enough,  ...

>> "record"...  verb "re-cord" noun "rec-rd" ...

>To be fair, I missed examples of this type (even if I don't
>necessarily agree with your hyphenation of "rec-ord").
>However this does *not* contradict the dictionary argument, in fact
>it enhances it.
>Assuming a parser was used to determine the part of speech of a word,
>no practical hyphenation algorithm could be devised to hyphenate
>words accordingly.  The dictionary of course *could* easily be
>constructed to contain different hyphenation points for different
>uses of a word when necessary.

All true, and my posting was probably out of line.  I just thought the
claim "it can be done solely by dictionary" a bit too overly-general to
pass without at least a token challenge...

>>> WYSIWYG systems (with the associated demise of much of the graphic
>>> arts industry) are becoming increasingly practical and popular, from
>>> Interleave to the good old "Mac"...

>> WYSIWYG systems have their proponents and their uses.  They are VERY good
>> for novice users, and given the way this field is growing I should think
>> that "novice users" are going to be the MAJORITY of users until the entire
>> society is computer literate.  (This much like "automobile literate" was
>> the thing to be when I was a teenager - something 19 year old males can be
>> macho about...)

>By your definition then, the majority of automobile users are
>novices, and always will be.  Their literacy does not extend further
>than the use of five or six controls.  There is no desire in the
>majority, nor *need*, to become "automobile literate" in your sense,
>the user interface has been designed that way.  The same argument
>applies to computer systems.

I seem to remember a posting from Brian some time ago making an analogy
between some task (which escapes me) and the design of carborators.  He
claimed there were <small integer> number of people in the country who can
actually design such a beast, that it is fraught with black magic, etc.
I believe him - I can barely manage a rebuild :-)

Now, does this qualify or disqualify me as "automobile literate"?  After
all, I don't smelt my own silicon either...

>> However, there are times when the WYSIWYG paradigm breaks down badly.  As
>> a somewhat strained analogy, a strict WYSIWYG system might have you use a
>> mouse to pick out letters from a menu, rather than using a conventional
>> keyboard.....

>Not just a strained analogy, totally irrelevant.  Its like saying "a
>keyboard *might* have just one key which you hit repeatedly until the
>character of choice appears, thus any computer system which uses a
>keyboard is ...".  

>We are discussing WYSIWYG systems, and the ability of the general
>user to do typesetting, not the pros and cons "of mice over
>keyboards" (gosh there's a title for a paper (:-)).  WYSIWYG implies
>an *approach*, not necessarily a specific user interface.

I don't see the two approaches as mutual exclusives, either.  A screen with
one window on the source script, another window on output document, and
real-time updating (:-) would be just dandy.  And yes, I am quite aware of
the resources such a beast would consume.  The 4k by 4k bitmapped terminal
wouldn't come cheap either.  But, the availability of such a beast could
really help with the training of users (more on this later).

>OK, so you're trying to make a point on "efficiency".  Good, that's
>what I'm doing as well.  With complex tasks like typesetting, to
>reduce this to a measure of keystroke counts is absurd.

>The use of a traditional typesetting system requires much dedication
>and training, and the ability to visualise the mapping from embedded
>commands to the resulting typeset page.  (Even with an expert,
>several trial runs on the hardcopy typesetter, or to a "soft" screen,
>are often required before the desired effect is achieved).

>Many people do not, and can never, have this ability, nor should they
>be *required* to train themselves for tasks ancilliary to their
>mainstream interest.  They didn't in the past, they turned to an
>"expert" (and paid him big bucks).  They shouldn't have to now, they
>turn to a computer.  Their literacy need only be how to drive the
>thing in a natural way to them, not to be able to manipulate the
>nuts and bolts.

>*That* is efficiency!

If people could CHEAPLY answer questions like "what would happen if we 
decided to use Basketball Oversize instead of Bimbo Stencil for that table 
on page three" (experimental approach with system described above) it
could help a great deal in helping people *develop* such abilities.

Of course, your argument is that they should not be *forced* to develop
those abilities.  I can only claim that *someone* will, because the very
high-level ideas of how *I* want to "drive the thing" will have to be
somehow translated into the low-level commands to the output device.  If
that takes an expert and big bucks, OK.  If it takes a computer, you will
be spending some bucks for that solution too.

Isn't your "in a way natural to them" a bit ambitious?  It seems to me 
that here you subsume a lot of the functionality of that "expert" who
you are cutting out of the circuit because his "bucks" are too "big".
You run the risk of turning people loose with too much freedom and too
little guidance.  SOMEBODY with SOME amount of graphics art knowlege and
experience is going to have to be around.

>A last point.  Compare this area of expertise with what has happened
>in the computerisation of other disciplines (spreadsheets, data
>managers, report generators, and the birth of the prime example,
>expert systems).

Ya know, I'd feel a whole lot better about these expert systems if we
knew more about how bad rules would affect system performance.  Its just
like us dumb old Humans to make conflicting rules and then refuse to
acknowlege the conflicts, and then some poor innocent gets really screwed.

I think one of the things really wrong with the present scheme of things is
that those people who really have the clout to make decisions and change
things are hidden away from the world and kept apart from the public by
massive burocracies.  If you don't know what I mean, try complaining to the
lady behind the desk at the airline counter at an airport.  Sure, she's
hired to be there and talk to you, but you can yell at her until you're blue
in the face and it still won't get back to that incompetent manager three
levels up the totem pole.

And now, not only are they hiding behind people, but you're going to have
them hide behind computers too.  Not to mention the possibility of some
brass hat general promulgating a rule that "airplanes from Cuba are to be
nuked without warning" into SDI, and ending up French-frying the last of
the Cuban capitalists out...

Other disciplines?  OK, companies are processing more bits with fewer workers
than ever before, and that may well be your idea of success.  But when 
things DO mess up, its a doozy.  That recent SNAFU over the wire-transfer 
switch in New York would have been hilarious except that it had a measurable
effect on the national economy...
-- 
Ben Cranston  ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben  zben@umd2.umd.edu

jbs@mit-eddie.UUCP (Jeff Siegal) (12/09/85)

In article <811@umd5.UUCP> zben@umd5.UUCP (Ben Cranston) writes:

>
>Other disciplines?  OK, companies are processing more bits with fewer workers
>than ever before, and that may well be your idea of success.  But when 
>things DO mess up, its a doozy.  That recent SNAFU over the wire-transfer 
                                  ----------------------------------------
>switch in New York would have been hilarious except that it had a measurable
 ------------------
>effect on the national economy...
>-- 
>Ben Cranston  ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben  zben@umd2.umd.edu

Please explain or provide reference.  Thanks.

Jeff Siegal - MIT EECS (jbs@mit-eddie, ...mit-eddie!jbs)