[comp.lang.postscript] Diatribe

rokicki@neon.Stanford.EDU (Tomas G. Rokicki) (05/05/91)

\chapter{12}{Pitfalls of PostScript}

PostScript has been a major boon to the computer and printing industries.
Many claim the Macintosh revolutionized publishing---actually, it was
John Warnock and his PostScript language that revolutionized publishing.
The Apple LaserWriter just happened to be the first very successful
printer to incorporate PostScript.

And there is no question about it, but the PostScript language is a very
elegant page description language.  It has an extremely small set of
primitives, be they language primitives or rendering primitives, and
a consistent interpretation of those primitives.  For example, rather than
add a special `arc' primitive, they simply realized that an arc can be
represented to any accuracy by a series of splines---and for any reasonable
accuracy, with a very small number of splines---so there was no need to
clutter up the renderer with arcs.  On the other hand, adding an arc
primitive to the language was easy---it just draws the requisite number
of splines.

But all is not bliss.  PostScript has a number of flaws.  Some are small,
some are large enough to hurl a planet through.  And many implementations
add their own set of bugs to the pot.  In this chapter, I indulge myself
by bringing up some of my pet peeves about PostScript and the various
implementations I have worked with.

I shall not even consider dealing with the half-way implementations of
PostScript that are available; PostScript as a language and an imaging
model is simple enough that any respectable programmer should be able to
implement it completely.  So I won't even mention things like lack of
save and restore.

Many of the problems are unequivocably Adobe's fault; many are applications
programmers'.  Some are addressed by PostScript Level 2; most are not.
In any case, only by identifying them can we have any hope of remedying them.

\section{1}{Language Problems}

The largest problem with the language itself is the large number of static
limits on all sorts of things, from path length to stack depth.  In
addition, normally dynamic things like dictionary sizes have to be
statically declared.  This is a pain, especially since there is often no
simple way to test if these limits will be exceeded.  These static limits
look doubly bad when you consider that they must often be split between the
program importing a graphic and the graphic itself.  Or when you consider
that at laser printer resolution, the path length might be short, but
at typesetter resolution, a lot more line segments are required to represent
that curve, overflowing a static limit.

But perhaps the single worst idea in the PostScript language itself is the
\.{bind} operator.  On the surface this operator may look good; it provides
a way to short-circuit the name lookup, and directly bind the use of a name
to the appropriate dictionary entry.  Adobe pushed this little feature very
hard; it is mentioned time and time again in various PostScript references
and tutorials.  Its main purpose is to increase the speed of the PostScript
interpreter.

But when you really sit down and benchmark, you'll find that very little of
the time is spent in name lookup, despite claims to the contrary.  Try
redefining \.{bind} to the null procedure at the top of some PostScript
graphics that use it, and time them before and after you make this
change---the speed difference will be minimal.

That wouldn't really hurt, though, except for the fact that \.{bind}
causes no end of grief due to improper use.  And it is used improperly
in many, many places; even Adobe Illustrator uses \.{bind} inappropriately
in at least one place.

The problem with using \.{bind} where it shouldn't be used hinges around
included graphics---nesting PostScript files inside other PostScript
files.  If you use \.{bind} in a context where there is no current
definition of the name on the stack, no binding is done and the name
stays in the procedure for later lookup.  So if you expect something to
be bound and it isn't, the file still works.

But if the PostScript program that includes your file as a graphic has a
definition for the name that is being bound, it is {\it that\/} definition
that is `caught' by the bind---and no amount of redefinitions by the
included graphic will `unbind' the name.  So the included graphic---and
the entire document---fails.

And this happens all the time, even in code by some of the most highly
respected software companies.  It is extremely difficult to test a
PostScript file at this level---if it prints, it must be `correct',
right?  Wrong.  Arghh.

\section{2}{Spooling and Interchange Problems}

All PostScript printers are not created equal---some have more memory
than others, some have more fonts, some even have a hard disk.  Thus,
writing a PostScript file that will print on any of these printers is
not trivial---either you go to the lowest common demoninator (which
allows you to use \.{Times-Roman} and \.{Courier}---not much else),
or you run the risk of not printing correctly on some printers.

In addition, some printers stack differently than others; it would
be nice if a PostScript file could be `reversed', or printed in the
reverse order, or if individual pages could be extracted for printing.

Adobe has attempted to alleviate these problems by defining a set of
comments that indicate what resources are needed by a particular
PostScript file and that identify certain sections of the file as `prolog',
`page', and `trailer' data.  These conventions have been revised
several times, sometimes incompatibly.  And each revision has ignored
certain aspects of the problem.

So now various software packages are compatible with various revisions
of the encapsulated PostScript conventions.  Various printer spoolers
handle various versions of the conventions---and improperly handle other
revisions.  Many, many spoolers and PostScript manipulation programs
do not handle nested PostScript files correctly---counting the pages
incorrectly, or terminating prematurely, or any of a number of
failures.  Few applications that import PostScript are smart enough to
pass the structured comments from the imported PostScript up to the
top level.

The main reason for this mess?  A lack of a test suite.  There is
currently no easy way to see if a particular file satisfies a particular
set of structuring conventions, and there are few programs that properly
parse the structuring conventions, so most application programmers handle
a couple of the more important ones and ignore the rest.

Another problem with interchange is the control-D character.  Many PostScript
printers connected over a serial or parallel port require this character at
the end of each job to mark the end of the job, so the printer knows to
effectively reset the memory contents and start fresh.  Since most
applications on microcomputers drive the printer directly, they have taken
it upon themselves to append, and often also prefix, a control-D to the
output, even when it is going to a file on the disk.

This would normally be okay, even though it `violates' the requirement that
PostScript be strictly \.{ASCII}.  But when a program must import this
file as a graphic, then, in order to do it properly, it must strip these
non-\.{ASCII} characters, because otherwise the printer would be `reset'
in the middle of a page---where the graphic was included!

And simply stripping away non-\.{ASCII} characters may not be the correct
thing to do, as some applications include binary in their PostScript in
such a way that the PostScript interpreter reads it correctly, and
removing such characters will prevent the graphic from being drawn.

It's a sad state of affairs, and one which shows no signs of improving
anytime soon.

\section{3}{Rendering Problems}

There are also tradeoffs in the design decisions of the imaging models.
Mostly, these concern quantization and aliasing, problems that in general
have no easy solution.  Adobe made the correct decision, in my opinion,
in `ignoring' these problems, and putting the onus on the applications
programmer to deal with them.  But so few applications address the issues.

One typical problem is difficulties in quantizing line widths.  If I
chose a line width that happens to be 1.5 pixels wide at a particular
resolution, lines at certain locations on the page will be two pixels
wide, and lines at other locations will be three pixels wide.  Look
through any scientific conference proceedings where PostScript figures
are used, and examine the horizontal and vertical lines carefully---they
typically vary due to exactly this problem.

And such lines on a shallow diagonal are even worse---they alternate
two and three pixels wide, giving somewhat of a twisted thread appearance
to what should be a perfectly straight line!

Both of these problems can be solved with care on the part of the applications
programmer (often yielding subtle problems in other respects).  For instance,
\TeX\ deals with this specific problem for horizontal and vertical rules
through careful guidelines for how rules are to be drawn.

A similar problem concerns placement of characters.  Because character widths
in PostScript are usually fractions of a pixel, the distance between two
`adjacent' characters will often vary depending on where on the page
they occur.  For instance, the word \.{pop} might have one pixel between the
extreme points of the first and second letters at one horizontal location,
but two pixels at another horizontal location.  Device drivers for \TeX\
also deal with this quantization problem, both for bitmapped fonts and, in
the case of Amiga\TeX\ PostScript font support, for the PostScript fonts.

Granted, these quantization and aliasing problems tend to go away at very
high typesetter resolutions, but the amount of published copy that is printed
on a 300 dpi laser printer is high enough where these problems should warrant
a bit more attention by the applications programmers.

\section{4}{Implementation Problems}

Implementations of PostScript also vary in quality from machine to machine.
Some bugs in the early LaserWriter printers are still haunting us now, and
every time a new PostScript printer comes onto the market, a new set of
PostScript bugs is apparently introduced.

I'll illustrate with only two common problems.  The first one is a font
access problem in the early LaserWriters.  The PostScript language definition
declares that once a font is `defined' with \.{definefont}, the font
dictionary becomes read-only.  The LaserWriters did not exhibit this behavior,
happily accepting dictionary definitions for defined fonts.  Thus, some
applications accidentally took advantage of this `feature'.

The first PostScript implementation to catch this particular exception
then generated lots and lots of bug reports---files that printed on the
`standard' LaserWriter wouldn't print on this new printer!  The blame went
unfairly on the new machine, with its clone interpreter---until someone
figured out what was happening.

Another problem that is still in many Adobe interpreters today is a rounding
problem with the matrix parameter to the \.{imagemask} operator that causes
certain bitmapped characters to be a pixel too low, or a pixel too far to
the left, with respect to the other characters.  This single pixel deviation
results in a remarkably uneven text baseline that is evident in many
publications even today.

And it only happens on certain printers.

We must not forget the typesetters---typesetters, mind you, machines that
are extremely expensive not only in capital but in maintainence---that have
less PostScript memory than the lowly Apple LaserWriter.  Who comes up with
these ideas?

\section{5}{Application Problems}

But it is the applications programmers that commit some of the worst sins.
They are the ones that cost untold thousands of dollars of wasted typesetter
film, every year, because things that printed one way on the laser printer
printed another on the typesetter.

For instance, some programs draw `hairlines' by setting the line width to
zero.  On a PostScript printer, a line with a width of zero prints the
thinnest line possible---which is still quite thick on your standard 300
dot per inch device, especially on the common Canon engines.  But a one
pixel line on a typesetter is invisible.  Line graphs literally disappear
on the typesetter if a zero line width is used.

Perhaps the user can be blamed, since perhaps it wasn't the default, maybe
the user set the line width to zero himself since he did want a nice thin
line.  But the program, or the manual, probably should have warned him about
what would happen on the typesetter.

Some fonts also fail at high resolution.  A nice intricate font might print
beautifully on a LaserWriter---but when the user brings it in to the
typesetting shop, the machine can't handle it.  You say you had a press
deadline?

Enough about typesetters, though.  Some very basic things are done wrong
by many applications.  Things like understanding that \.{showpage} (and
a couple other operators) must be redefined before a graphic can be
included---because that graphic should and does contain a \.{showpage}
of its own.  Things like generating incorrect bounding box comments.
Or invalid syntax in the bounding box comment.  Or programs that generate
a 60,000 byte file---for a simple graphic with a few lines in it.
Simple things should be simple, complex things should be possible---both
extremes are equally important.

Then there is the issue of paper size.  Many PostScript printers provide a
set of operators to adjust the interpreter for a particular paper---these
operators are typically called \.{letter}, \.{a4}, and the like.  But
executing these operators clears the page and resets the transformation
matrix---so if they are used in any file, that file cannot then be used
as an included graphic.  Yet, a substantial fraction of applications programs
include calls to these operators.

I've ranted, I've raved.  Heck, I've even broken some of the above rules
myself.  I'll be quiet now.  But don't get me started on this subject
again, or this chapter will double in length.

glenn@heaven.woodside.ca.us (Glenn Reid) (05/06/91)

Tomas G. Rokicki writes

[diatribe omitted to save bandwidth]

Tom, you raise some good, undeniable points.  I won't argue with you.

However, I will point out that your description of what is wrong with
"bind" appears to have a slight oversimplification:

> But if the PostScript program that includes your file as a graphic has a
> definition for the name that is being bound, it is {\it that\/} definition
> that is `caught' by the bind---and no amount of redefinitions by the
> included graphic will `unbind' the name.  So the included graphic---and
> the entire document---fails.

The nit that I wanted to pick is that it's not "any" definition for the
name that's being bound.  The definition must refer directly to an
operator object for it to be `caught' by bind.  Other than that, you're
right.  See my other post for another perspective on this problem,
though.

As for the rest of the diatribe, although I agree with you, I actually
think that the problems are no worse than in most other areas of the
computer industry (compare versions and implementations of UNIX, for
example, or look at the "32-bit clean" nightmare on the Mac, soon to
be the "System 7.0" nightmare).  Any time a large number of people
participate in a portable solution, there are going to be misinterpretations
of the standard, differences in implementation, and all the other
little things that can go wrong when you can't test your stuff against
everybody else's stuff.

Imagine two carpenters building a door.  One builds the frame, the other
builds the door.  Neither one is on-site, and neither has met the other.
They are both working from the "spec", or the blueprint.  How likely
do you think it is that the door fits in the frame, that the latch lands
squarely in the hole in the door frame, and that the hinges are set
properly?  I wouldn't give it 1 chance in 100,000, even with experienced
carpenters.

Most of us have been forced to work from the "spec", whether it be the
red book, the structuring conventions, or whatever.  It's actually
surprising to me that so much of it *does* work.

That does not, of couse, permit us to rest or to ignore the problems.
It does, to me, at least, suggest that a massive testing paradigm is
the only thing that can save us.  Imagine if Adobe set up a lab with
every known PostScript interpreter, and a program for running your
code through the mill to test it against all printers, other peoples'
software, include it in TeX output, etc.  Even if they charged a lot
of money for the service, it would have an enormous benefit for the
finished products and the way they all dovetailed together, and
would only serve to cement PostScript's position in multi-platform
computing.

(Glenn) cvn

--
 Glenn Reid				RightBrain Software
 glenn@heaven.woodside.ca.us		NeXT/PostScript developers
 ..{adobe,next}!heaven!glenn		415-326-2974 (NeXTfax 326-2977)