rokicki@neon.Stanford.EDU (Tomas G. Rokicki) (05/05/91)
\chapter{12}{Pitfalls of PostScript} PostScript has been a major boon to the computer and printing industries. Many claim the Macintosh revolutionized publishing---actually, it was John Warnock and his PostScript language that revolutionized publishing. The Apple LaserWriter just happened to be the first very successful printer to incorporate PostScript. And there is no question about it, but the PostScript language is a very elegant page description language. It has an extremely small set of primitives, be they language primitives or rendering primitives, and a consistent interpretation of those primitives. For example, rather than add a special `arc' primitive, they simply realized that an arc can be represented to any accuracy by a series of splines---and for any reasonable accuracy, with a very small number of splines---so there was no need to clutter up the renderer with arcs. On the other hand, adding an arc primitive to the language was easy---it just draws the requisite number of splines. But all is not bliss. PostScript has a number of flaws. Some are small, some are large enough to hurl a planet through. And many implementations add their own set of bugs to the pot. In this chapter, I indulge myself by bringing up some of my pet peeves about PostScript and the various implementations I have worked with. I shall not even consider dealing with the half-way implementations of PostScript that are available; PostScript as a language and an imaging model is simple enough that any respectable programmer should be able to implement it completely. So I won't even mention things like lack of save and restore. Many of the problems are unequivocably Adobe's fault; many are applications programmers'. Some are addressed by PostScript Level 2; most are not. In any case, only by identifying them can we have any hope of remedying them. \section{1}{Language Problems} The largest problem with the language itself is the large number of static limits on all sorts of things, from path length to stack depth. In addition, normally dynamic things like dictionary sizes have to be statically declared. This is a pain, especially since there is often no simple way to test if these limits will be exceeded. These static limits look doubly bad when you consider that they must often be split between the program importing a graphic and the graphic itself. Or when you consider that at laser printer resolution, the path length might be short, but at typesetter resolution, a lot more line segments are required to represent that curve, overflowing a static limit. But perhaps the single worst idea in the PostScript language itself is the \.{bind} operator. On the surface this operator may look good; it provides a way to short-circuit the name lookup, and directly bind the use of a name to the appropriate dictionary entry. Adobe pushed this little feature very hard; it is mentioned time and time again in various PostScript references and tutorials. Its main purpose is to increase the speed of the PostScript interpreter. But when you really sit down and benchmark, you'll find that very little of the time is spent in name lookup, despite claims to the contrary. Try redefining \.{bind} to the null procedure at the top of some PostScript graphics that use it, and time them before and after you make this change---the speed difference will be minimal. That wouldn't really hurt, though, except for the fact that \.{bind} causes no end of grief due to improper use. And it is used improperly in many, many places; even Adobe Illustrator uses \.{bind} inappropriately in at least one place. The problem with using \.{bind} where it shouldn't be used hinges around included graphics---nesting PostScript files inside other PostScript files. If you use \.{bind} in a context where there is no current definition of the name on the stack, no binding is done and the name stays in the procedure for later lookup. So if you expect something to be bound and it isn't, the file still works. But if the PostScript program that includes your file as a graphic has a definition for the name that is being bound, it is {\it that\/} definition that is `caught' by the bind---and no amount of redefinitions by the included graphic will `unbind' the name. So the included graphic---and the entire document---fails. And this happens all the time, even in code by some of the most highly respected software companies. It is extremely difficult to test a PostScript file at this level---if it prints, it must be `correct', right? Wrong. Arghh. \section{2}{Spooling and Interchange Problems} All PostScript printers are not created equal---some have more memory than others, some have more fonts, some even have a hard disk. Thus, writing a PostScript file that will print on any of these printers is not trivial---either you go to the lowest common demoninator (which allows you to use \.{Times-Roman} and \.{Courier}---not much else), or you run the risk of not printing correctly on some printers. In addition, some printers stack differently than others; it would be nice if a PostScript file could be `reversed', or printed in the reverse order, or if individual pages could be extracted for printing. Adobe has attempted to alleviate these problems by defining a set of comments that indicate what resources are needed by a particular PostScript file and that identify certain sections of the file as `prolog', `page', and `trailer' data. These conventions have been revised several times, sometimes incompatibly. And each revision has ignored certain aspects of the problem. So now various software packages are compatible with various revisions of the encapsulated PostScript conventions. Various printer spoolers handle various versions of the conventions---and improperly handle other revisions. Many, many spoolers and PostScript manipulation programs do not handle nested PostScript files correctly---counting the pages incorrectly, or terminating prematurely, or any of a number of failures. Few applications that import PostScript are smart enough to pass the structured comments from the imported PostScript up to the top level. The main reason for this mess? A lack of a test suite. There is currently no easy way to see if a particular file satisfies a particular set of structuring conventions, and there are few programs that properly parse the structuring conventions, so most application programmers handle a couple of the more important ones and ignore the rest. Another problem with interchange is the control-D character. Many PostScript printers connected over a serial or parallel port require this character at the end of each job to mark the end of the job, so the printer knows to effectively reset the memory contents and start fresh. Since most applications on microcomputers drive the printer directly, they have taken it upon themselves to append, and often also prefix, a control-D to the output, even when it is going to a file on the disk. This would normally be okay, even though it `violates' the requirement that PostScript be strictly \.{ASCII}. But when a program must import this file as a graphic, then, in order to do it properly, it must strip these non-\.{ASCII} characters, because otherwise the printer would be `reset' in the middle of a page---where the graphic was included! And simply stripping away non-\.{ASCII} characters may not be the correct thing to do, as some applications include binary in their PostScript in such a way that the PostScript interpreter reads it correctly, and removing such characters will prevent the graphic from being drawn. It's a sad state of affairs, and one which shows no signs of improving anytime soon. \section{3}{Rendering Problems} There are also tradeoffs in the design decisions of the imaging models. Mostly, these concern quantization and aliasing, problems that in general have no easy solution. Adobe made the correct decision, in my opinion, in `ignoring' these problems, and putting the onus on the applications programmer to deal with them. But so few applications address the issues. One typical problem is difficulties in quantizing line widths. If I chose a line width that happens to be 1.5 pixels wide at a particular resolution, lines at certain locations on the page will be two pixels wide, and lines at other locations will be three pixels wide. Look through any scientific conference proceedings where PostScript figures are used, and examine the horizontal and vertical lines carefully---they typically vary due to exactly this problem. And such lines on a shallow diagonal are even worse---they alternate two and three pixels wide, giving somewhat of a twisted thread appearance to what should be a perfectly straight line! Both of these problems can be solved with care on the part of the applications programmer (often yielding subtle problems in other respects). For instance, \TeX\ deals with this specific problem for horizontal and vertical rules through careful guidelines for how rules are to be drawn. A similar problem concerns placement of characters. Because character widths in PostScript are usually fractions of a pixel, the distance between two `adjacent' characters will often vary depending on where on the page they occur. For instance, the word \.{pop} might have one pixel between the extreme points of the first and second letters at one horizontal location, but two pixels at another horizontal location. Device drivers for \TeX\ also deal with this quantization problem, both for bitmapped fonts and, in the case of Amiga\TeX\ PostScript font support, for the PostScript fonts. Granted, these quantization and aliasing problems tend to go away at very high typesetter resolutions, but the amount of published copy that is printed on a 300 dpi laser printer is high enough where these problems should warrant a bit more attention by the applications programmers. \section{4}{Implementation Problems} Implementations of PostScript also vary in quality from machine to machine. Some bugs in the early LaserWriter printers are still haunting us now, and every time a new PostScript printer comes onto the market, a new set of PostScript bugs is apparently introduced. I'll illustrate with only two common problems. The first one is a font access problem in the early LaserWriters. The PostScript language definition declares that once a font is `defined' with \.{definefont}, the font dictionary becomes read-only. The LaserWriters did not exhibit this behavior, happily accepting dictionary definitions for defined fonts. Thus, some applications accidentally took advantage of this `feature'. The first PostScript implementation to catch this particular exception then generated lots and lots of bug reports---files that printed on the `standard' LaserWriter wouldn't print on this new printer! The blame went unfairly on the new machine, with its clone interpreter---until someone figured out what was happening. Another problem that is still in many Adobe interpreters today is a rounding problem with the matrix parameter to the \.{imagemask} operator that causes certain bitmapped characters to be a pixel too low, or a pixel too far to the left, with respect to the other characters. This single pixel deviation results in a remarkably uneven text baseline that is evident in many publications even today. And it only happens on certain printers. We must not forget the typesetters---typesetters, mind you, machines that are extremely expensive not only in capital but in maintainence---that have less PostScript memory than the lowly Apple LaserWriter. Who comes up with these ideas? \section{5}{Application Problems} But it is the applications programmers that commit some of the worst sins. They are the ones that cost untold thousands of dollars of wasted typesetter film, every year, because things that printed one way on the laser printer printed another on the typesetter. For instance, some programs draw `hairlines' by setting the line width to zero. On a PostScript printer, a line with a width of zero prints the thinnest line possible---which is still quite thick on your standard 300 dot per inch device, especially on the common Canon engines. But a one pixel line on a typesetter is invisible. Line graphs literally disappear on the typesetter if a zero line width is used. Perhaps the user can be blamed, since perhaps it wasn't the default, maybe the user set the line width to zero himself since he did want a nice thin line. But the program, or the manual, probably should have warned him about what would happen on the typesetter. Some fonts also fail at high resolution. A nice intricate font might print beautifully on a LaserWriter---but when the user brings it in to the typesetting shop, the machine can't handle it. You say you had a press deadline? Enough about typesetters, though. Some very basic things are done wrong by many applications. Things like understanding that \.{showpage} (and a couple other operators) must be redefined before a graphic can be included---because that graphic should and does contain a \.{showpage} of its own. Things like generating incorrect bounding box comments. Or invalid syntax in the bounding box comment. Or programs that generate a 60,000 byte file---for a simple graphic with a few lines in it. Simple things should be simple, complex things should be possible---both extremes are equally important. Then there is the issue of paper size. Many PostScript printers provide a set of operators to adjust the interpreter for a particular paper---these operators are typically called \.{letter}, \.{a4}, and the like. But executing these operators clears the page and resets the transformation matrix---so if they are used in any file, that file cannot then be used as an included graphic. Yet, a substantial fraction of applications programs include calls to these operators. I've ranted, I've raved. Heck, I've even broken some of the above rules myself. I'll be quiet now. But don't get me started on this subject again, or this chapter will double in length.
glenn@heaven.woodside.ca.us (Glenn Reid) (05/06/91)
Tomas G. Rokicki writes [diatribe omitted to save bandwidth] Tom, you raise some good, undeniable points. I won't argue with you. However, I will point out that your description of what is wrong with "bind" appears to have a slight oversimplification: > But if the PostScript program that includes your file as a graphic has a > definition for the name that is being bound, it is {\it that\/} definition > that is `caught' by the bind---and no amount of redefinitions by the > included graphic will `unbind' the name. So the included graphic---and > the entire document---fails. The nit that I wanted to pick is that it's not "any" definition for the name that's being bound. The definition must refer directly to an operator object for it to be `caught' by bind. Other than that, you're right. See my other post for another perspective on this problem, though. As for the rest of the diatribe, although I agree with you, I actually think that the problems are no worse than in most other areas of the computer industry (compare versions and implementations of UNIX, for example, or look at the "32-bit clean" nightmare on the Mac, soon to be the "System 7.0" nightmare). Any time a large number of people participate in a portable solution, there are going to be misinterpretations of the standard, differences in implementation, and all the other little things that can go wrong when you can't test your stuff against everybody else's stuff. Imagine two carpenters building a door. One builds the frame, the other builds the door. Neither one is on-site, and neither has met the other. They are both working from the "spec", or the blueprint. How likely do you think it is that the door fits in the frame, that the latch lands squarely in the hole in the door frame, and that the hinges are set properly? I wouldn't give it 1 chance in 100,000, even with experienced carpenters. Most of us have been forced to work from the "spec", whether it be the red book, the structuring conventions, or whatever. It's actually surprising to me that so much of it *does* work. That does not, of couse, permit us to rest or to ignore the problems. It does, to me, at least, suggest that a massive testing paradigm is the only thing that can save us. Imagine if Adobe set up a lab with every known PostScript interpreter, and a program for running your code through the mill to test it against all printers, other peoples' software, include it in TeX output, etc. Even if they charged a lot of money for the service, it would have an enormous benefit for the finished products and the way they all dovetailed together, and would only serve to cement PostScript's position in multi-platform computing. (Glenn) cvn -- Glenn Reid RightBrain Software glenn@heaven.woodside.ca.us NeXT/PostScript developers ..{adobe,next}!heaven!glenn 415-326-2974 (NeXTfax 326-2977)