[comp.lang.postscript] PostScript Level II, contextual forms

rcd@ico.isc.com (Dick Dunn) (08/22/90)

ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes:
> What sort of support will Level II PostScript have for contextual forms?
> As I've tried to point out before, this is important not only for non-Roman
> writing systems, but for some Roman fonts as well.

> * A string of text shouldn't need to contain different codes to represent
>   different contextual forms of the same character. Instead, this should
>   be resolved at the time the text is drawn. This keeps things simple for
>   the application.

While the nature of the problem is correctly stated, PostScript is the
wrong level to make such a decision.  At the level of PostScript, the
output of text should be regarded as the output of a sequence of glyphs
which have no inherent semantics.  Characters are simply objects being
placed on the page or screen...it doesn't make sense to ascribe language-
dependent context to them.  If there is any "semantic" content to a
character at the PostScript level, it has to do with things like width or
bounding box.

Decisions have to be made upstream of PostScript anyway.  For one example,
consider that text justification depends on character widths.  The
substitution of one glyph for another will alter the justification and 
potentially thereby alter the layout of an entire page.  You don't do the
layout in PostScript (unless you're (a) a masochist, (b) doing simplistic
layout, and (c) patient:-)
-- 
Dick Dunn     rcd@ico.isc.com -or- ico!rcd       Boulder, CO   (303)449-2870
   ...Are you making this up as you go along?

uad1077@dircon.uucp (08/24/90)

rcd@ico.isc.com (Dick Dunn ) writes:
>..... YOu don't do the
> layout in PostScript (unless you're (a) a masochist, (b) doing simplistic
> layout, and (c) patient:-)

You might want to do it if you're using a PostScript-based window manager.
(Could be NeWS or DPS, I guess).  Certainly for a really beautiful
document, you would perhaps do the layout in the client application,
but if you are using what the NeWS people call a desk-accessory (i.e.
a small program that lives entirely inside the server), you would
want a cheap-and-cheerful way of doing layout that still didn't make
bloopers in your mother tongue....  I *seem* to remember that the
people at Xerox working on the Multi-lingual word-processor (Scientific
American article?????) devoted some effort to this question.  Not sure
though.

-- 
Ian D. Kemmish                    Tel. +44 767 601 361
18 Durham Close                   uad1077@dircon.UUCP
Biggleswade                       ukc!dircon!uad1077
Beds SG18 8HZ United Kingd    uad1077%dircon@ukc.ac.uk

ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) (08/24/90)

In <1990Aug22.051728.16659@ico.isc.com>, rcd@ico.isc.com (Dick Dunn)
responds to my questions about support for contextual forms and interactive
WYSIWYG editing as follows:

"PostScript is the wrong level to make such a decision.  At the level of
PostScript, the output of text should be regarded as the output of a sequence
of glyphs which have no inherent semantics.  Characters are simply objects being
placed on the page or screen...it doesn't make sense to ascribe language-
dependent context to them."

So, you need another layer on top of PostScript to implement these functions.
Performance considerations aside, do you have a standard for this layer?
Or does this mean that programs that want to be writing-system-independent
cannot be operating-system-independent?

Remember, virtually *all* text drawing calls would have to go through
this extra layer; it no longer becomes possible to access PostScript directly,
unless you either a) add special cases to your code to handle all the
peculiarities of different writing systems, or b) you want your software
to be useful only in places where they use the Roman alphabet.

"Decisions have to be made upstream of PostScript anyway."

Dick mentioned text justification as an example. To be more specific, you
have the problem of deciding what's the optimum distribution of width
adjustments--how much to alter the spacing between words (for writing
systems that have spaces between words), and (if you really must) how much
you can alter character widths. Also the rules for varying character widths
depend on the writing system--for example, Arabic writing widens characters
by drawing extension bars between them. Then there's the problem of word
breaks at the ends of lines (the rules for finding word breaks are
writing-system-dependent), and hyphenation rules (if relevant).
Oh, and I forgot to mention automatic kerning...

If you have a separate text-rendition layer, you'd then have to maintain
*two* sets of information about each font: the standard PostScript font
dictionary, and all this extra information, in order to draw sensible-looking,
readable text with that font.

You can look at it another way: PostScript becomes just a low-level tool
for implementing a text-rendition and page-description system, rather
than being such a system in itself.

I'm not suggesting that you write PostScript programs to do page layout
(thought I agree it would become possible). But *somewhere* there needs to
be the information necessary to allow the application to do it--and it
needs to be available in a standard form, or portability goes out the window.

What do you think? What's the right way to do things?

Lawrence D'Oliveiro                       fone: +64-71-562-889
Computer Services Dept                     fax: +64-71-384-066
University of Waikato            electric mail: ldo@waikato.ac.nz
Hamilton, New Zealand    37^ 47' 26" S, 175^ 19' 7" E, GMT+12:00
To someone with a hammer and a screwdriver, every problem looks
like a nail with threads.

r91400@memqa.uucp (Michael C. Grant) (08/24/90)

In article <1330.26d576c4@waikato.ac.nz>, ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes:
> In <1990Aug22.051728.16659@ico.isc.com>, rcd@ico.isc.com (Dick Dunn)
> responds to my questions about support for contextual forms and interactive
> WYSIWYG editing as follows:
> 
> "PostScript is the wrong level to make such a decision.  At the level of
> PostScript, the output of text should be regarded as the output of a sequence
> of glyphs which have no inherent semantics.  Characters are simply objects being
> placed on the page or screen...it doesn't make sense to ascribe language-
> dependent context to them."
> 
> So, you need another layer on top of PostScript to implement these functions.
> Performance considerations aside, do you have a standard for this layer?
> Or does this mean that programs that want to be writing-system-independent
> cannot be operating-system-independent?

It seems to me that there are contextual forms in any language.  For example,
in English we capitalize certain words.  So, 'E' and 'e' are the same letter,
but just two different forms.  Now, if someone told me that I should just
type in all lower case, and let the Postscript printer convert the proper
lettters to uppercase for me, then I would laugh in his face :-)

Other languages, of course, are much more complicated in this respect,
however, but that doesn't change my point.  In my opinion, Dick Dunn is
right when he says that Postscript is not the place for contextual forms.
No, it is in the character set itself!  In other words, just as we have
two different ASCII codes for the letters 'E' and 'e', so would the Japanese
have two different codes for hiragana 'e' and katakana 'e', and all of the
kanji that sound like an 'e'.  It is the typists responsibility (usually)
to choose which character is appropriate!  After all, you don't expect
your pen and paper to automatically perform capitalization for you...

Now, in the case of Chinese and Japanese, I can understand the need for
another layer in which to ease the burden of the typist.  Some interesting
word processors in these languages allow them to type on a reduced keyboard,
while it chooses the proper characters to use based not only on the syntactic
context but the SEMANTIC context as well!  But, when it comes time to 
save the file to the disk, or SEND THE FILE TO THE PRINTER, each characteqr
has ALREADY been given its unique code.

A Postscript printer is simply a computerized pen and paper.  You have to
tell it WHAT to write, EXPLICITLY.  It makes to contextual judgements, just
as a normal pen an paper do not--that is left to the driving program.

Michael C. Grant

rcd@ico.isc.com (Dick Dunn) (08/25/90)

ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes:

[about some issues in handling text; topic started from context-dependent
glyph selection]
> So, you need another layer on top of PostScript to implement these functions.
> Performance considerations aside, do you have a standard for this layer?

These decisions can be made in the layer that handles other natural-language
dependencies, such as hyphenation.  I'd call it a "text formatter" for want
of a better word.

> Or does this mean that programs that want to be writing-system-independent
> cannot be operating-system-independent?

Not at all.  The operating system shouldn't have to enter into the issue.
The layer which handles the language issues is a program which takes input
generated by the user and produces PostScript as output.  The generated
output is a program to place glyphs (I'm consciously avoiding the word
"character" because it has too many meanings and it's not right here) on
the page.  PostScript is used to describe the appearance of the page, but
the natural-language considerations have all been taken care of before
then.

> If you have a separate text-rendition layer, you'd then have to maintain
> *two* sets of information about each font: the standard PostScript font
> dictionary, and all this extra information, in order to draw sensible-looking,
> readable text with that font.
...
> I'm not suggesting that you write PostScript programs to do page layout
> (thought I agree it would become possible). But *somewhere* there needs to
> be the information necessary to allow the application to do it--and it
> needs to be available in a standard form, or portability goes out the window.

Yes, you get the information about the fonts in two places--on either side
of the PostScript interface.

The information is available for the application to do it.  It's in the
.afm (Adobe Font Metric) files.  There's one such file for each font.
These files give the necessary layout information--widths and bounding
boxes of characters.  They also contain information describing ligatures,
composite characters (e.g., characters built from a base plus a dia-
critical), kerning pairs, and such--things that the PostScript interpre-
ter doesn't know about in its text-rendition operators.  (For example, the
interpreter can't do ligature substitution; it can't know whether it's
needed/desired.)

Applications which do "serious" text processing and produce PostScript
output either use the .afm files directly or (more commonly) have some
associated utility which predigests the .afm's into an internal format
containing the information needed by the application.  But either way,
you've got a standard form for the information the application needs.
-- 
Dick Dunn     rcd@ico.isc.com -or- ico!rcd       Boulder, CO   (303)449-2870
   ...I'm not cynical - just experienced.

ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) (08/29/90)

In <1990Aug24.182905.24152@ico.isc.com>, rcd@ico.isc.com (Dick Dunn)
makes the following comments on my posting <1330.26d576c4@waikato.ac.nz>
about PostScript's problems in supporting applications which will work
across multiple writing systems:

"[decisions about writing-system-dependent aspects of text layout] can be
made in the layer that handles other natural-language dependencies, such
as hyphenation.  I'd call it a 'text formatter' for want of a better word."

"The operating system shouldn't have to enter into the issue. The layer
which handles the language issues is a program which takes input generated
by the user and produces PostScript as output."

"The information is available for the application ... in the ..afm
(Adobe Font Metric) files."

I get the feeling that you believe that there will only ever be one
text-formatting application in the world. This suggests you me that
you're not a PC user, as you have no appreciation of the sheer variety
of word processors and page-layout programs available for PCs, quite apart
from command-driven text formatters like TEX. Not only that, but other
applications--such as drawing programs--need the ability to handle a
certain amount of text as well.

Are you suggesting that all these applications reinvent the writing-system-
dependent aspects of text handling? Isn't PostScript important precisely
because of the fact that it provides a common solution to several common
problems of text handling? Wouldn't it be nice if it were extended to solve
more of them?

Now that I've made my point clearer, you might like to reread my previous
posting, and reconsider some of the features I asked about, and see if
they make a bit more sense.

By the way, AFM files don't go half of the way towards addressing the
points I raised.

Lawrence D'Oliveiro                       fone: +64-71-562-889
Computer Services Dept                     fax: +64-71-384-066
University of Waikato            electric mail: ldo@waikato.ac.nz
Hamilton, New Zealand    37^ 47' 26" S, 175^ 19' 7" E, GMT+12:00

glenn@heaven.woodside.ca.us (Glenn Reid) (08/30/90)

In article <1376.26dc15f6@waikato.ac.nz> ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes:
>Are you suggesting that all these applications reinvent the writing-system-
>dependent aspects of text handling? Isn't PostScript important precisely
>because of the fact that it provides a common solution to several common
>problems of text handling? Wouldn't it be nice if it were extended to solve
>more of them?

I would have to say, yes, word-processing applications must be rewritten to
adapt to writing-system-dependent text handling.  Would you expect your
English-Language word processor to automatically adapt to, say, Japanese,
just because you installed a Kanji font in your system?  The answer is
pretty clearly "no", I think, unless you have the benefit of something like
Apple's ScriptManager, which is exactly the layer between the application
and the printer that Dick mentioned, and is probably the right way to go.
On the NeXT computer there is a Text object that can be made to perform
writing-system-dependent operations without modification to the application.
On the PC, I'm not aware of this level of abstraction in any of the window
environments, but it may be there.

I fully agree with Dick that your printer (i.e. PostScript) should not be
formatting your document, breaking lines, or otherwise making layout
decisions.

>By the way, AFM files don't go half of the way towards addressing the
>points I raised.

I believe that the AFM format has been extended to cover other writing systems
like Japanese.  Try out the Adobe file server and/or contact Adobe for more
information.

(Glenn) cvn

-- 
 Glenn Reid				RightBrain Software
 glenn@heaven.woodside.ca.us		PostScript/NeXT developers
 ..{adobe,next}!heaven!glenn		415-851-1785

r91400@memqa.uucp (Michael C. Grant) (08/30/90)

In article <1376.26dc15f6@waikato.ac.nz>, ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes:
> Are you suggesting that all these applications reinvent the writing-system-
> dependent aspects of text handling? Isn't PostScript important precisely
> because of the fact that it provides a common solution to several common
> problems of text handling? Wouldn't it be nice if it were extended to solve
> more of them?

Okay, tell me the flaw in this logic:
Assume that Postscript handles contextual forms for me.  So, if I send
it the 'gloop' character, there might be four or five different ways
that it would print out, depending upon its 'surroundings'.

Great.  Now, assume that we DON'T have Display Postscript.  How in the
world are we going to display these forms on the screen before we
send them to the laser printer?  Easy, just DUPLICATE THE WORK
PERFORMED BY POSTSCRIPT IN THE APPLICATION.

Sorry, but I don't like doing things twice!

Now, let us assume that there is more than one code for the 'gloop'
character--a unique code for each of its forms (just as there is a separate
code for capital and lowercase 'a', for example).  Now, either the user
or the application chooses the proper form, and send that UNIQUE code to
the PostScript printer.  Voila--the printer does not have to interpret
the code contextually, it just runs through the lookup table as always.

Sure, contextual forms are usually more complex that 'A' vs. 'a', but
the idea is similar: we press the SHIFT key to get 'A'.  Why not press
a special key, for example, to get the end-of-the-word representation
of an Arabic letter, or the katakana versus hiragana representations
of a Japanese character?  When we write by hand, we must make that
adjustment, and so it is quite natural for us to think this way.

I really don't see why the latter scenario, in which the contextual
interpretation is performed ONCE (and rather quickly, I might add),
than the former scenario, in which it is performed at least TWICE,
if not EVERY time the character is displayed on the screen.

Michael C. Grant

ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) (08/31/90)

I received the following e-mail message from Dick Dunn. I publish it
without further comment.

From:   IN%"rcd@ico.isc.com"  "Dick Dunn"
To:     ccc_ldo@waikato.ac.nz
CC:
Subj:   Re: PostScript Level II, contextual forms

Received: from ico.isc.com by waikato.ac.nz; Thu, 30 Aug 90 23:40 +1200
Received: from keystone.ico.isc.com by ico.isc.com (5.61/1.35) id AA15027; Wed,
 29 Aug 90 12:46:28 -0600
Received: by keystone.ico.isc.com (5.61/ISC-CTO/06-27-90/ico.isc.com-leaf) id
 AA12296; Wed, 29 Aug 90 12:43:24 -0600
Date: Wed, 29 Aug 90 12:43:24 -0600
From: Dick Dunn <rcd@ico.isc.com>
Subject: Re: PostScript Level II, contextual forms
To: ccc_ldo@waikato.ac.nz
Message-Id: <9008291843.AA12296@keystone.ico.isc.com>
In-Reply-To: your article <1376.26dc15f6@waikato.ac.nz>
News-Path:
 ico!ncar!asuvax!cs.utexas.edu!samsung!munnari.oz.au!uhccux!virtue!ccc_ldo

Your latest followup is extremely rude and condescending.  I am quite aware
of the many text-formatting systems which exist; we use several here on a
daily basis.  I have worked with many different formatters of many
different types over more than two decades, and I have worked with Post-
Script since shortly after its initial public release.  You need not try
to defend your position by attacking my experience.

The problem is not in my lack of understanding of the text-formatting
problem but in your lack of understanding of the purpose of PostScript.
PostScript is not in any way designed, intended, or capable of handling
language-specific text processing issues.  The issues you describe are
valid concerns, but they are utterly inappropriate for the level of ab-
straction at which PostScript operates.  If you do not wish to use Post-
Script for its intended purposes, you hardly have a valid complaint that it
doesn't meet your goals.

I do not think that further discussion on the net is going to get us any-
where until you have a better understanding of the purpose of PostScript.
---
Dick Dunn     rcd@ico.isc.com -or- ico!rcd       Boulder, CO   (303)449-2870

rcd@ico.isc.com (Dick Dunn) (08/31/90)

ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes:

> I received the following e-mail message from Dick Dunn. I publish it
> without further comment.

[my mail message, posted without my permission, in violation of any
semblance of {n}etiquette]

I am sorry that D'Oliviero has forced this back into a public forum.  I had
tried to take it to email because the discussion had digressed, with his
last posting, from what could have been a useful consideration of language-
processing issues into an unfounded, personal attack by D'Oliviero on my
abilities, background, and understanding of the issues.  It has been clear
that D'Oliviero not only fails to understand the purposes and goals of
PostScript (which, by itself, would be no big fault), but refuses to admit
the possibility that he's approaching the problem in the wrong way.  He
posits an approach which is wildly at variance with all existing practice
and requires radical changes to PostScript, then flames people who try to
guide him back to a useful answer.

I stand by the statements I made in the email.  I considerably understated
my relevant background--e.g., I omitted five years or so working for a
company that made word-processing systems in an international market and a
separate project "internationalizing" another text formatter for both
European and Oriental languages--but that's no matter.  We can't help
D'Oliviero.  He doesn't want to listen and he doesn't want to learn.

Sorry for the waste of bandwidth.  I tried to take it offline, but
appparently D'Oliviero is more interested in a dispute than in solving a
problem, and seems to have some considerable ego invested in being
egregiously wrong.
-- 
Dick Dunn     rcd@ico.isc.com -or- ico!rcd       Boulder, CO   (303)449-2870
   ...I'm not cynical - just experienced.

woody@chinacat.Unicom.COM (Woody Baker @ Eagle Signal) (09/04/90)

In article <5805@memqa.uucp>, r91400@memqa.uucp (Michael C. Grant) writes:
> In article <1376.26dc15f6@waikato.ac.nz>, ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes:
> 
> Okay, tell me the flaw in this logic:
> Assume that Postscript handles contextual forms for me.  So, if I send
> it the 'gloop' character, there might be four or five different ways
> that it would print out, depending upon its 'surroundings'.

I think that contextual forms are fairly limited.  If you send it the
gloop character as you call it, about the only thing that really
could happen would be automatic capitalization, and there is a fairly
specific set of rules to do that.  A character is a character.


> 
> Great.  Now, assume that we DON'T have Display Postscript.  How in the
> world are we going to display these forms on the screen before we
> send them to the laser printer?  Easy, just DUPLICATE THE WORK

Why is this even a concern?  Why should you even need to worry about
displaying these forms to the screen?  If they exist in the machines
characterset, they will automatically be displayed.  Take the 
ntilde for example. It is directly accessable from the character set
on the pc.  If you are working on a machine that supports some other
language, it will certainly have it's own set of built in characters.


> Now, let us assume that there is more than one code for the 'gloop'
> character--a unique code for each of its forms (just as there is a separate
> code for capital and lowercase 'a', for example).  Now, either the user
> or the application chooses the proper form, and send that UNIQUE code to
> the PostScript printer.  Voila--the printer does not have to interpret
> the code contextually, it just runs through the lookup table as always.
>

Or the printer, which more than likely has more computational power than
the machine that is driving it, can choose the proper form.  A PC/AT
class machine running DOS (there are more of them than all other
machines put together) does one thing at a time.  I prefer to be able
to let some other CPU do as much work as possible, so my machine does
not stay tied up.  After all, what impacts me the most is the tool
that I use.

> Sure, contextual forms are usually more complex that 'A' vs. 'a', but
> the idea is similar: we press the SHIFT key to get 'A'.  Why not press
> a special key, for example, to get the end-of-the-word representation
> of a Japanese character?  When we write by hand, we must make that
> adjustment, and so it is quite natural for us to think this way.

Certainly, but why should my machine have to worry about placement of
characters, or substitution.  The program running in the printer is
perfectly capable of that.  Now, this does require a shift in the
perspective that one views PostScript in, that is rather than a
page description laser driver, it is a complex programming language
that happens to (by design) do a very fine job of laying graphics
and text down.  If all you want to do is send minimal command
sequences, then why even have Postscript at all?



> 
> than the former scenario, in which it is performed at least TWICE,
Generaly the hardware can and does handle this, and if it doesnot,
well, you don't *have* to have WYSIWYG.


Cheers
Woody