rcd@ico.isc.com (Dick Dunn) (08/22/90)
ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes: > What sort of support will Level II PostScript have for contextual forms? > As I've tried to point out before, this is important not only for non-Roman > writing systems, but for some Roman fonts as well. > * A string of text shouldn't need to contain different codes to represent > different contextual forms of the same character. Instead, this should > be resolved at the time the text is drawn. This keeps things simple for > the application. While the nature of the problem is correctly stated, PostScript is the wrong level to make such a decision. At the level of PostScript, the output of text should be regarded as the output of a sequence of glyphs which have no inherent semantics. Characters are simply objects being placed on the page or screen...it doesn't make sense to ascribe language- dependent context to them. If there is any "semantic" content to a character at the PostScript level, it has to do with things like width or bounding box. Decisions have to be made upstream of PostScript anyway. For one example, consider that text justification depends on character widths. The substitution of one glyph for another will alter the justification and potentially thereby alter the layout of an entire page. You don't do the layout in PostScript (unless you're (a) a masochist, (b) doing simplistic layout, and (c) patient:-) -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...Are you making this up as you go along?
uad1077@dircon.uucp (08/24/90)
rcd@ico.isc.com (Dick Dunn ) writes: >..... YOu don't do the > layout in PostScript (unless you're (a) a masochist, (b) doing simplistic > layout, and (c) patient:-) You might want to do it if you're using a PostScript-based window manager. (Could be NeWS or DPS, I guess). Certainly for a really beautiful document, you would perhaps do the layout in the client application, but if you are using what the NeWS people call a desk-accessory (i.e. a small program that lives entirely inside the server), you would want a cheap-and-cheerful way of doing layout that still didn't make bloopers in your mother tongue.... I *seem* to remember that the people at Xerox working on the Multi-lingual word-processor (Scientific American article?????) devoted some effort to this question. Not sure though. -- Ian D. Kemmish Tel. +44 767 601 361 18 Durham Close uad1077@dircon.UUCP Biggleswade ukc!dircon!uad1077 Beds SG18 8HZ United Kingd uad1077%dircon@ukc.ac.uk
ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) (08/24/90)
In <1990Aug22.051728.16659@ico.isc.com>, rcd@ico.isc.com (Dick Dunn) responds to my questions about support for contextual forms and interactive WYSIWYG editing as follows: "PostScript is the wrong level to make such a decision. At the level of PostScript, the output of text should be regarded as the output of a sequence of glyphs which have no inherent semantics. Characters are simply objects being placed on the page or screen...it doesn't make sense to ascribe language- dependent context to them." So, you need another layer on top of PostScript to implement these functions. Performance considerations aside, do you have a standard for this layer? Or does this mean that programs that want to be writing-system-independent cannot be operating-system-independent? Remember, virtually *all* text drawing calls would have to go through this extra layer; it no longer becomes possible to access PostScript directly, unless you either a) add special cases to your code to handle all the peculiarities of different writing systems, or b) you want your software to be useful only in places where they use the Roman alphabet. "Decisions have to be made upstream of PostScript anyway." Dick mentioned text justification as an example. To be more specific, you have the problem of deciding what's the optimum distribution of width adjustments--how much to alter the spacing between words (for writing systems that have spaces between words), and (if you really must) how much you can alter character widths. Also the rules for varying character widths depend on the writing system--for example, Arabic writing widens characters by drawing extension bars between them. Then there's the problem of word breaks at the ends of lines (the rules for finding word breaks are writing-system-dependent), and hyphenation rules (if relevant). Oh, and I forgot to mention automatic kerning... If you have a separate text-rendition layer, you'd then have to maintain *two* sets of information about each font: the standard PostScript font dictionary, and all this extra information, in order to draw sensible-looking, readable text with that font. You can look at it another way: PostScript becomes just a low-level tool for implementing a text-rendition and page-description system, rather than being such a system in itself. I'm not suggesting that you write PostScript programs to do page layout (thought I agree it would become possible). But *somewhere* there needs to be the information necessary to allow the application to do it--and it needs to be available in a standard form, or portability goes out the window. What do you think? What's the right way to do things? Lawrence D'Oliveiro fone: +64-71-562-889 Computer Services Dept fax: +64-71-384-066 University of Waikato electric mail: ldo@waikato.ac.nz Hamilton, New Zealand 37^ 47' 26" S, 175^ 19' 7" E, GMT+12:00 To someone with a hammer and a screwdriver, every problem looks like a nail with threads.
r91400@memqa.uucp (Michael C. Grant) (08/24/90)
In article <1330.26d576c4@waikato.ac.nz>, ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes: > In <1990Aug22.051728.16659@ico.isc.com>, rcd@ico.isc.com (Dick Dunn) > responds to my questions about support for contextual forms and interactive > WYSIWYG editing as follows: > > "PostScript is the wrong level to make such a decision. At the level of > PostScript, the output of text should be regarded as the output of a sequence > of glyphs which have no inherent semantics. Characters are simply objects being > placed on the page or screen...it doesn't make sense to ascribe language- > dependent context to them." > > So, you need another layer on top of PostScript to implement these functions. > Performance considerations aside, do you have a standard for this layer? > Or does this mean that programs that want to be writing-system-independent > cannot be operating-system-independent? It seems to me that there are contextual forms in any language. For example, in English we capitalize certain words. So, 'E' and 'e' are the same letter, but just two different forms. Now, if someone told me that I should just type in all lower case, and let the Postscript printer convert the proper lettters to uppercase for me, then I would laugh in his face :-) Other languages, of course, are much more complicated in this respect, however, but that doesn't change my point. In my opinion, Dick Dunn is right when he says that Postscript is not the place for contextual forms. No, it is in the character set itself! In other words, just as we have two different ASCII codes for the letters 'E' and 'e', so would the Japanese have two different codes for hiragana 'e' and katakana 'e', and all of the kanji that sound like an 'e'. It is the typists responsibility (usually) to choose which character is appropriate! After all, you don't expect your pen and paper to automatically perform capitalization for you... Now, in the case of Chinese and Japanese, I can understand the need for another layer in which to ease the burden of the typist. Some interesting word processors in these languages allow them to type on a reduced keyboard, while it chooses the proper characters to use based not only on the syntactic context but the SEMANTIC context as well! But, when it comes time to save the file to the disk, or SEND THE FILE TO THE PRINTER, each characteqr has ALREADY been given its unique code. A Postscript printer is simply a computerized pen and paper. You have to tell it WHAT to write, EXPLICITLY. It makes to contextual judgements, just as a normal pen an paper do not--that is left to the driving program. Michael C. Grant
rcd@ico.isc.com (Dick Dunn) (08/25/90)
ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes: [about some issues in handling text; topic started from context-dependent glyph selection] > So, you need another layer on top of PostScript to implement these functions. > Performance considerations aside, do you have a standard for this layer? These decisions can be made in the layer that handles other natural-language dependencies, such as hyphenation. I'd call it a "text formatter" for want of a better word. > Or does this mean that programs that want to be writing-system-independent > cannot be operating-system-independent? Not at all. The operating system shouldn't have to enter into the issue. The layer which handles the language issues is a program which takes input generated by the user and produces PostScript as output. The generated output is a program to place glyphs (I'm consciously avoiding the word "character" because it has too many meanings and it's not right here) on the page. PostScript is used to describe the appearance of the page, but the natural-language considerations have all been taken care of before then. > If you have a separate text-rendition layer, you'd then have to maintain > *two* sets of information about each font: the standard PostScript font > dictionary, and all this extra information, in order to draw sensible-looking, > readable text with that font. ... > I'm not suggesting that you write PostScript programs to do page layout > (thought I agree it would become possible). But *somewhere* there needs to > be the information necessary to allow the application to do it--and it > needs to be available in a standard form, or portability goes out the window. Yes, you get the information about the fonts in two places--on either side of the PostScript interface. The information is available for the application to do it. It's in the .afm (Adobe Font Metric) files. There's one such file for each font. These files give the necessary layout information--widths and bounding boxes of characters. They also contain information describing ligatures, composite characters (e.g., characters built from a base plus a dia- critical), kerning pairs, and such--things that the PostScript interpre- ter doesn't know about in its text-rendition operators. (For example, the interpreter can't do ligature substitution; it can't know whether it's needed/desired.) Applications which do "serious" text processing and produce PostScript output either use the .afm files directly or (more commonly) have some associated utility which predigests the .afm's into an internal format containing the information needed by the application. But either way, you've got a standard form for the information the application needs. -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...I'm not cynical - just experienced.
ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) (08/29/90)
In <1990Aug24.182905.24152@ico.isc.com>, rcd@ico.isc.com (Dick Dunn) makes the following comments on my posting <1330.26d576c4@waikato.ac.nz> about PostScript's problems in supporting applications which will work across multiple writing systems: "[decisions about writing-system-dependent aspects of text layout] can be made in the layer that handles other natural-language dependencies, such as hyphenation. I'd call it a 'text formatter' for want of a better word." "The operating system shouldn't have to enter into the issue. The layer which handles the language issues is a program which takes input generated by the user and produces PostScript as output." "The information is available for the application ... in the ..afm (Adobe Font Metric) files." I get the feeling that you believe that there will only ever be one text-formatting application in the world. This suggests you me that you're not a PC user, as you have no appreciation of the sheer variety of word processors and page-layout programs available for PCs, quite apart from command-driven text formatters like TEX. Not only that, but other applications--such as drawing programs--need the ability to handle a certain amount of text as well. Are you suggesting that all these applications reinvent the writing-system- dependent aspects of text handling? Isn't PostScript important precisely because of the fact that it provides a common solution to several common problems of text handling? Wouldn't it be nice if it were extended to solve more of them? Now that I've made my point clearer, you might like to reread my previous posting, and reconsider some of the features I asked about, and see if they make a bit more sense. By the way, AFM files don't go half of the way towards addressing the points I raised. Lawrence D'Oliveiro fone: +64-71-562-889 Computer Services Dept fax: +64-71-384-066 University of Waikato electric mail: ldo@waikato.ac.nz Hamilton, New Zealand 37^ 47' 26" S, 175^ 19' 7" E, GMT+12:00
glenn@heaven.woodside.ca.us (Glenn Reid) (08/30/90)
In article <1376.26dc15f6@waikato.ac.nz> ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes: >Are you suggesting that all these applications reinvent the writing-system- >dependent aspects of text handling? Isn't PostScript important precisely >because of the fact that it provides a common solution to several common >problems of text handling? Wouldn't it be nice if it were extended to solve >more of them? I would have to say, yes, word-processing applications must be rewritten to adapt to writing-system-dependent text handling. Would you expect your English-Language word processor to automatically adapt to, say, Japanese, just because you installed a Kanji font in your system? The answer is pretty clearly "no", I think, unless you have the benefit of something like Apple's ScriptManager, which is exactly the layer between the application and the printer that Dick mentioned, and is probably the right way to go. On the NeXT computer there is a Text object that can be made to perform writing-system-dependent operations without modification to the application. On the PC, I'm not aware of this level of abstraction in any of the window environments, but it may be there. I fully agree with Dick that your printer (i.e. PostScript) should not be formatting your document, breaking lines, or otherwise making layout decisions. >By the way, AFM files don't go half of the way towards addressing the >points I raised. I believe that the AFM format has been extended to cover other writing systems like Japanese. Try out the Adobe file server and/or contact Adobe for more information. (Glenn) cvn -- Glenn Reid RightBrain Software glenn@heaven.woodside.ca.us PostScript/NeXT developers ..{adobe,next}!heaven!glenn 415-851-1785
r91400@memqa.uucp (Michael C. Grant) (08/30/90)
In article <1376.26dc15f6@waikato.ac.nz>, ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes: > Are you suggesting that all these applications reinvent the writing-system- > dependent aspects of text handling? Isn't PostScript important precisely > because of the fact that it provides a common solution to several common > problems of text handling? Wouldn't it be nice if it were extended to solve > more of them? Okay, tell me the flaw in this logic: Assume that Postscript handles contextual forms for me. So, if I send it the 'gloop' character, there might be four or five different ways that it would print out, depending upon its 'surroundings'. Great. Now, assume that we DON'T have Display Postscript. How in the world are we going to display these forms on the screen before we send them to the laser printer? Easy, just DUPLICATE THE WORK PERFORMED BY POSTSCRIPT IN THE APPLICATION. Sorry, but I don't like doing things twice! Now, let us assume that there is more than one code for the 'gloop' character--a unique code for each of its forms (just as there is a separate code for capital and lowercase 'a', for example). Now, either the user or the application chooses the proper form, and send that UNIQUE code to the PostScript printer. Voila--the printer does not have to interpret the code contextually, it just runs through the lookup table as always. Sure, contextual forms are usually more complex that 'A' vs. 'a', but the idea is similar: we press the SHIFT key to get 'A'. Why not press a special key, for example, to get the end-of-the-word representation of an Arabic letter, or the katakana versus hiragana representations of a Japanese character? When we write by hand, we must make that adjustment, and so it is quite natural for us to think this way. I really don't see why the latter scenario, in which the contextual interpretation is performed ONCE (and rather quickly, I might add), than the former scenario, in which it is performed at least TWICE, if not EVERY time the character is displayed on the screen. Michael C. Grant
ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) (08/31/90)
I received the following e-mail message from Dick Dunn. I publish it without further comment. From: IN%"rcd@ico.isc.com" "Dick Dunn" To: ccc_ldo@waikato.ac.nz CC: Subj: Re: PostScript Level II, contextual forms Received: from ico.isc.com by waikato.ac.nz; Thu, 30 Aug 90 23:40 +1200 Received: from keystone.ico.isc.com by ico.isc.com (5.61/1.35) id AA15027; Wed, 29 Aug 90 12:46:28 -0600 Received: by keystone.ico.isc.com (5.61/ISC-CTO/06-27-90/ico.isc.com-leaf) id AA12296; Wed, 29 Aug 90 12:43:24 -0600 Date: Wed, 29 Aug 90 12:43:24 -0600 From: Dick Dunn <rcd@ico.isc.com> Subject: Re: PostScript Level II, contextual forms To: ccc_ldo@waikato.ac.nz Message-Id: <9008291843.AA12296@keystone.ico.isc.com> In-Reply-To: your article <1376.26dc15f6@waikato.ac.nz> News-Path: ico!ncar!asuvax!cs.utexas.edu!samsung!munnari.oz.au!uhccux!virtue!ccc_ldo Your latest followup is extremely rude and condescending. I am quite aware of the many text-formatting systems which exist; we use several here on a daily basis. I have worked with many different formatters of many different types over more than two decades, and I have worked with Post- Script since shortly after its initial public release. You need not try to defend your position by attacking my experience. The problem is not in my lack of understanding of the text-formatting problem but in your lack of understanding of the purpose of PostScript. PostScript is not in any way designed, intended, or capable of handling language-specific text processing issues. The issues you describe are valid concerns, but they are utterly inappropriate for the level of ab- straction at which PostScript operates. If you do not wish to use Post- Script for its intended purposes, you hardly have a valid complaint that it doesn't meet your goals. I do not think that further discussion on the net is going to get us any- where until you have a better understanding of the purpose of PostScript. --- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870
rcd@ico.isc.com (Dick Dunn) (08/31/90)
ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes: > I received the following e-mail message from Dick Dunn. I publish it > without further comment. [my mail message, posted without my permission, in violation of any semblance of {n}etiquette] I am sorry that D'Oliviero has forced this back into a public forum. I had tried to take it to email because the discussion had digressed, with his last posting, from what could have been a useful consideration of language- processing issues into an unfounded, personal attack by D'Oliviero on my abilities, background, and understanding of the issues. It has been clear that D'Oliviero not only fails to understand the purposes and goals of PostScript (which, by itself, would be no big fault), but refuses to admit the possibility that he's approaching the problem in the wrong way. He posits an approach which is wildly at variance with all existing practice and requires radical changes to PostScript, then flames people who try to guide him back to a useful answer. I stand by the statements I made in the email. I considerably understated my relevant background--e.g., I omitted five years or so working for a company that made word-processing systems in an international market and a separate project "internationalizing" another text formatter for both European and Oriental languages--but that's no matter. We can't help D'Oliviero. He doesn't want to listen and he doesn't want to learn. Sorry for the waste of bandwidth. I tried to take it offline, but appparently D'Oliviero is more interested in a dispute than in solving a problem, and seems to have some considerable ego invested in being egregiously wrong. -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...I'm not cynical - just experienced.
woody@chinacat.Unicom.COM (Woody Baker @ Eagle Signal) (09/04/90)
In article <5805@memqa.uucp>, r91400@memqa.uucp (Michael C. Grant) writes: > In article <1376.26dc15f6@waikato.ac.nz>, ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes: > > Okay, tell me the flaw in this logic: > Assume that Postscript handles contextual forms for me. So, if I send > it the 'gloop' character, there might be four or five different ways > that it would print out, depending upon its 'surroundings'. I think that contextual forms are fairly limited. If you send it the gloop character as you call it, about the only thing that really could happen would be automatic capitalization, and there is a fairly specific set of rules to do that. A character is a character. > > Great. Now, assume that we DON'T have Display Postscript. How in the > world are we going to display these forms on the screen before we > send them to the laser printer? Easy, just DUPLICATE THE WORK Why is this even a concern? Why should you even need to worry about displaying these forms to the screen? If they exist in the machines characterset, they will automatically be displayed. Take the ntilde for example. It is directly accessable from the character set on the pc. If you are working on a machine that supports some other language, it will certainly have it's own set of built in characters. > Now, let us assume that there is more than one code for the 'gloop' > character--a unique code for each of its forms (just as there is a separate > code for capital and lowercase 'a', for example). Now, either the user > or the application chooses the proper form, and send that UNIQUE code to > the PostScript printer. Voila--the printer does not have to interpret > the code contextually, it just runs through the lookup table as always. > Or the printer, which more than likely has more computational power than the machine that is driving it, can choose the proper form. A PC/AT class machine running DOS (there are more of them than all other machines put together) does one thing at a time. I prefer to be able to let some other CPU do as much work as possible, so my machine does not stay tied up. After all, what impacts me the most is the tool that I use. > Sure, contextual forms are usually more complex that 'A' vs. 'a', but > the idea is similar: we press the SHIFT key to get 'A'. Why not press > a special key, for example, to get the end-of-the-word representation > of a Japanese character? When we write by hand, we must make that > adjustment, and so it is quite natural for us to think this way. Certainly, but why should my machine have to worry about placement of characters, or substitution. The program running in the printer is perfectly capable of that. Now, this does require a shift in the perspective that one views PostScript in, that is rather than a page description laser driver, it is a complex programming language that happens to (by design) do a very fine job of laying graphics and text down. If all you want to do is send minimal command sequences, then why even have Postscript at all? > > than the former scenario, in which it is performed at least TWICE, Generaly the hardware can and does handle this, and if it doesnot, well, you don't *have* to have WYSIWYG. Cheers Woody