[comp.lang.postscript] SUMMARY: Is HP PostScript known to hang mysteriously?

jorgnsn@qucis.queensu.ca (John Jorgensen) (01/21/91)

On December 4, I asked what to do about two HP LaserJet IIIs with HP's
PostScript cartridge, which were hanging intermittently on perfectly
good PostScript jobs.  The printers showed "PROCESSING DATA"
indefinitely, without any pages coming out.  Occasionally I could take
the printers offline and reset them from the front panel, but usually
I couldn't get past "OFFLINE PENDING", and had to switch the printer
off to reset it.  Most, but not all, of the hanging jobs were produced
by a DVI-to-PostScript translator called dvitps.  The same jobs would
print fine when resubmitted.

Here is a summary of the suggestions I received (sorry for the
delay--I was away on a long Christmas holiday, and then I waited to
see if some of the suggestions helped).

First, I should acknowledge the responses of a number of
Hewlett-Packard employees on the net.  Bob Jewett
<jewett@hplabs.hpl.hp.com> posted my message to an HP-internal
notesgroup.  Stefan Stolz <stolz%hpber002.hp.com@hplb.hpl.hp.com> told
me about similar problems he had observed at one of his sites.  Kevin
Brown <brown@hpbsm15.boi.hp.com> gave me the number of the LaserJet
Support Line and forwarded my initial problem report to Judy Lolley at
the support line.  She was friendly and helpful, but the support line
staff was unable to isolate my problem.  They left the case open and
suggested I reduce the baud rate.

A number of people mentioned the possibility of flaky memory on a
third-party memory board causing this problem (Allen Michielsen
<amichiel@rodan.acs.syr.edu>, Tom Lane <Tom.Lane@G.GP.CS.CMU.EDU>,
Stan Chesnutt <chesnutt@adobe.com>).  At the time I posted my original
query, our printers each had 1 megabyte of expansion memory on an EXP
Computer Inc. "Laser Ramboard".  This is probably too little
memory--HP recommends at least two megabytes of optional memory for
PostScript, though only one is absolutely required.  But, a shortage
of memory should result in a VMerror not a hanging printer.

In my original posting I expressed doubt that we were experiencing a
communications problem, since I knew from our spooler logs that often,
when the printer hung up after printing page N, it had actually started
receiving the data for page N + 2, and usually had just executed a
showpage for page N + 1.  Since usually, but not absolutely always,
the hung printers responded to the offline button with "OFFLINE
PENDING", it seemed to me that the problem was associated with the
attempt to actually print a page.

Many people told me I was too quick to dismiss communications
problems.

Edwin Kremer <edwin@cs.ruu.nl> sent me information about a "flow
control hysteresis" problem that was originally posted on comp.sys.hp
by Michael Ashley <mcba@newt.phys.unsw.oz.au> and Bob Niland
<rjn@hpfcso.hp.com>.  Apparently some workstations (the DEC 5000 in
this case) don't stop sending characters soon enough after receiving
an XOFF.  Mr.  Ashley's LaserJet III responded to the extra input with
an "IO CONFIG ERR".  I haven't seen any such errors, and I suspect
that if extra characters were simply lost as a result of such a
problem, the result would be either a PostScript syntax error (which
our spooler would log), or visibly botched output--not a hung printer.
Still, this remains a possibility I haven't eliminated.

Jeff Wieland <wieland@ecn.purdue.edu> said that his LaserJet II with
PostScript had occasionally lost the end of print jobs until he turned
off Robust XON-XOFF handshaking.  Desh D. Sharma
<desh@watserv1.uwaterloo.ca> mentioned problems with robust XON-XOFF
as well.  Mr Niland's note about flow control also explained how
Robust XON-XOFF works (essentially, when the printer is idle it keeps
sending XON every few seconds), and how it could cause problems if the
host doesn't discard extra XONs received from the printer.  We have been
running with robust XON-XOFF set (so that when I have to power-cycle the
hung printers they will eventually prod the host into flushing the
rest of its queued output).

John Polstra <polstra!jdp@uunet.uu.net> sent me information about
known flow-control bugs with MTI boards in SunOS 4.0.3 and 4.1, along
with the Sun ID of a patch that had fixed a similar problem with his
LaserWriter II NTX (Patch-ID# 100137-01).  He advised me to either get
the patches, or try moving our printer lines to the CPU board serial
ports from the Systech MTI board where we had them.

Allen Michielsen and Bob Jewett both suggested trying a lower baud
rate (we have been using 9600) to ease any handshaking problems
between the printer and host.  Mr. Michielsen also recommended the
Transcript package of lpd filters.

Mike Schuster <schuster@cup.portal.com> and Woody Baker
<woody@chinacat.Unicom.COM> told me that Don Lancaster has isolated
bugs in the HP cartridge and Mr. Schuster relayed my problem report to
Mr.  Lancaster.  The known bugs involve repetitions of the "arc" and
"setscreen" operators.  I have not observed these operators in any of
the files which have hung here.

Patricio Ortiz <ortiz@lynx.astro.utoronto.ca>, Kathy Pearson
<kpearson@cattell.psych.upenn.edu>, Philip Murphy <mhi@btr.com>, and
Thomas Tonino <ttonino@bio.vu.nl> have experienced similar problems on
various printer/host configurations, but have been unable to isolate
the difficulty.

Now for the suggestions I have tried so far.

The first thing we did was to move the busier printer from an MTI
board port to /dev/ttyb on another Sun 3/160.  The printer continued
to hang.

Next we added two more megabytes to each of the printers (onto the
original third-party boards).  For a while I thought that this had
done the trick, but it turns out the lack of problems was just due to
a Christmas lull in printer use.  In the first week and a half after
the holidays, our busier printer hung up about six times.

In the process of investigating one of these hangs, I discovered a
blunder on my part which has certainly been aggravating the problem,
though it probably is not the sole cause.  We run a simple home-grown
lpd filter for our PostScript printers.  It logs their error output
and queries them for pagecounts between jobs.  The main loop of this
filter alternately writes a bufferful of data to the printer, and then
checks if it needs to read any messages sent back from the printer.
Some time ago, I increased the size of the filter's writes, and I
suspect that as a result, communications between the printer and host
sometimes deadlocks (the host blocks while writing one of the large
blocks, and the printer blocks waiting for the host to read standard
output).  So the people who advised me to check communications were
at least partly right.

As an interim measure, I reduced the size of the writes to something I
thought (from pstat -S) was unlikely to fill the serial device's write
buffer in one gulp (what I should really do is replace the
write-then-read sequence with a select-then-read-or-write sequence).
This certainly seems to have helped matters--our busier printer has
only hung once with the new filter, after a period where it was
hanging once or twice a day.  I fear I haven't eliminated the problem
though, both because we have had the one hang since the change, and
because when the problem first cropped up, the printer wasn't trying
to write anything back to the host at all (most of the messages from
the printer are the result of diagnostic output I have since added to
dvitps's PostScript prologue, in an attempt to find out what is going
on--I guess this is the sysadmin's version of the Heisenberg
principle).


If the difficulties do persist, my next step will be to try turning
off Robust XON-XOFF.  Since our printers are not idle when the hangs
occur, I have assumed robust XON-XOFF is unlikely to be our problem.
I've also hesitated to switch it off because I fear that if the
printers continue to hang so that I must continue to power-cycle them,
I will have a deadlock when the printer comes up (the host still
waiting for an XOFF before it sends more data, the reset printer not
even knowing that there is data waiting for it).  I suppose I could
turn Robust XON-XOFF back on for a while after bringing the printer
back up, in order to flush the old job.

Thanks again for your help, and I apologize if I forgot to credit any
of the people who replied.

John Jorgensen		jorgnsn@qucis.queensu.ca	(613) 545 6784
Systems Programmer, Dept. of Computing Science, Queen's University

les@chinet.chi.il.us (Leslie Mikesell) (01/21/91)

In article <1060@quiddity.queensu.CA> jorgnsn@qucis.queensu.ca (John Jorgensen) writes:
>On December 4, I asked what to do about two HP LaserJet IIIs with HP's
>PostScript cartridge, which were hanging intermittently on perfectly
>good PostScript jobs.  The printers showed "PROCESSING DATA"
>indefinitely, without any pages coming out.  Occasionally I could take
>the printers offline and reset them from the front panel, but usually
>I couldn't get past "OFFLINE PENDING", and had to switch the printer
>off to reset it.  Most, but not all, of the hanging jobs were produced
>by a DVI-to-PostScript translator called dvitps.  The same jobs would
>print fine when resubmitted.

I don't think your summary covered the simple and likely posibility
that the printer is returning data to the host and becomes deadlocked
due to the host sending and XOFF to the printer.  This will happen
in the case where you have enabled XON/XOFF flow control from the
host to the printer and do not read from the port often enough to
collect the data returned.  The printer will not continue until it
has been able to output any data that may have been requested by
commands embedded in the postscript file.  To test for this, when the
printer is apparently hung, just do a "cat </dev/ttyxx" using the
appropriate port name.  If that clears the condition, then the problem was
the reverse flow control.  To fix it, either eliminate the XON/XOFF from
the host and allow the data to be discarded or fork a process to
read continuously to avoid deadlock.

Les Mikesell
  les@chinet.chi.il.us