jorgnsn@qucis.queensu.ca (John Jorgensen) (01/21/91)
On December 4, I asked what to do about two HP LaserJet IIIs with HP's PostScript cartridge, which were hanging intermittently on perfectly good PostScript jobs. The printers showed "PROCESSING DATA" indefinitely, without any pages coming out. Occasionally I could take the printers offline and reset them from the front panel, but usually I couldn't get past "OFFLINE PENDING", and had to switch the printer off to reset it. Most, but not all, of the hanging jobs were produced by a DVI-to-PostScript translator called dvitps. The same jobs would print fine when resubmitted. Here is a summary of the suggestions I received (sorry for the delay--I was away on a long Christmas holiday, and then I waited to see if some of the suggestions helped). First, I should acknowledge the responses of a number of Hewlett-Packard employees on the net. Bob Jewett <jewett@hplabs.hpl.hp.com> posted my message to an HP-internal notesgroup. Stefan Stolz <stolz%hpber002.hp.com@hplb.hpl.hp.com> told me about similar problems he had observed at one of his sites. Kevin Brown <brown@hpbsm15.boi.hp.com> gave me the number of the LaserJet Support Line and forwarded my initial problem report to Judy Lolley at the support line. She was friendly and helpful, but the support line staff was unable to isolate my problem. They left the case open and suggested I reduce the baud rate. A number of people mentioned the possibility of flaky memory on a third-party memory board causing this problem (Allen Michielsen <amichiel@rodan.acs.syr.edu>, Tom Lane <Tom.Lane@G.GP.CS.CMU.EDU>, Stan Chesnutt <chesnutt@adobe.com>). At the time I posted my original query, our printers each had 1 megabyte of expansion memory on an EXP Computer Inc. "Laser Ramboard". This is probably too little memory--HP recommends at least two megabytes of optional memory for PostScript, though only one is absolutely required. But, a shortage of memory should result in a VMerror not a hanging printer. In my original posting I expressed doubt that we were experiencing a communications problem, since I knew from our spooler logs that often, when the printer hung up after printing page N, it had actually started receiving the data for page N + 2, and usually had just executed a showpage for page N + 1. Since usually, but not absolutely always, the hung printers responded to the offline button with "OFFLINE PENDING", it seemed to me that the problem was associated with the attempt to actually print a page. Many people told me I was too quick to dismiss communications problems. Edwin Kremer <edwin@cs.ruu.nl> sent me information about a "flow control hysteresis" problem that was originally posted on comp.sys.hp by Michael Ashley <mcba@newt.phys.unsw.oz.au> and Bob Niland <rjn@hpfcso.hp.com>. Apparently some workstations (the DEC 5000 in this case) don't stop sending characters soon enough after receiving an XOFF. Mr. Ashley's LaserJet III responded to the extra input with an "IO CONFIG ERR". I haven't seen any such errors, and I suspect that if extra characters were simply lost as a result of such a problem, the result would be either a PostScript syntax error (which our spooler would log), or visibly botched output--not a hung printer. Still, this remains a possibility I haven't eliminated. Jeff Wieland <wieland@ecn.purdue.edu> said that his LaserJet II with PostScript had occasionally lost the end of print jobs until he turned off Robust XON-XOFF handshaking. Desh D. Sharma <desh@watserv1.uwaterloo.ca> mentioned problems with robust XON-XOFF as well. Mr Niland's note about flow control also explained how Robust XON-XOFF works (essentially, when the printer is idle it keeps sending XON every few seconds), and how it could cause problems if the host doesn't discard extra XONs received from the printer. We have been running with robust XON-XOFF set (so that when I have to power-cycle the hung printers they will eventually prod the host into flushing the rest of its queued output). John Polstra <polstra!jdp@uunet.uu.net> sent me information about known flow-control bugs with MTI boards in SunOS 4.0.3 and 4.1, along with the Sun ID of a patch that had fixed a similar problem with his LaserWriter II NTX (Patch-ID# 100137-01). He advised me to either get the patches, or try moving our printer lines to the CPU board serial ports from the Systech MTI board where we had them. Allen Michielsen and Bob Jewett both suggested trying a lower baud rate (we have been using 9600) to ease any handshaking problems between the printer and host. Mr. Michielsen also recommended the Transcript package of lpd filters. Mike Schuster <schuster@cup.portal.com> and Woody Baker <woody@chinacat.Unicom.COM> told me that Don Lancaster has isolated bugs in the HP cartridge and Mr. Schuster relayed my problem report to Mr. Lancaster. The known bugs involve repetitions of the "arc" and "setscreen" operators. I have not observed these operators in any of the files which have hung here. Patricio Ortiz <ortiz@lynx.astro.utoronto.ca>, Kathy Pearson <kpearson@cattell.psych.upenn.edu>, Philip Murphy <mhi@btr.com>, and Thomas Tonino <ttonino@bio.vu.nl> have experienced similar problems on various printer/host configurations, but have been unable to isolate the difficulty. Now for the suggestions I have tried so far. The first thing we did was to move the busier printer from an MTI board port to /dev/ttyb on another Sun 3/160. The printer continued to hang. Next we added two more megabytes to each of the printers (onto the original third-party boards). For a while I thought that this had done the trick, but it turns out the lack of problems was just due to a Christmas lull in printer use. In the first week and a half after the holidays, our busier printer hung up about six times. In the process of investigating one of these hangs, I discovered a blunder on my part which has certainly been aggravating the problem, though it probably is not the sole cause. We run a simple home-grown lpd filter for our PostScript printers. It logs their error output and queries them for pagecounts between jobs. The main loop of this filter alternately writes a bufferful of data to the printer, and then checks if it needs to read any messages sent back from the printer. Some time ago, I increased the size of the filter's writes, and I suspect that as a result, communications between the printer and host sometimes deadlocks (the host blocks while writing one of the large blocks, and the printer blocks waiting for the host to read standard output). So the people who advised me to check communications were at least partly right. As an interim measure, I reduced the size of the writes to something I thought (from pstat -S) was unlikely to fill the serial device's write buffer in one gulp (what I should really do is replace the write-then-read sequence with a select-then-read-or-write sequence). This certainly seems to have helped matters--our busier printer has only hung once with the new filter, after a period where it was hanging once or twice a day. I fear I haven't eliminated the problem though, both because we have had the one hang since the change, and because when the problem first cropped up, the printer wasn't trying to write anything back to the host at all (most of the messages from the printer are the result of diagnostic output I have since added to dvitps's PostScript prologue, in an attempt to find out what is going on--I guess this is the sysadmin's version of the Heisenberg principle). If the difficulties do persist, my next step will be to try turning off Robust XON-XOFF. Since our printers are not idle when the hangs occur, I have assumed robust XON-XOFF is unlikely to be our problem. I've also hesitated to switch it off because I fear that if the printers continue to hang so that I must continue to power-cycle them, I will have a deadlock when the printer comes up (the host still waiting for an XOFF before it sends more data, the reset printer not even knowing that there is data waiting for it). I suppose I could turn Robust XON-XOFF back on for a while after bringing the printer back up, in order to flush the old job. Thanks again for your help, and I apologize if I forgot to credit any of the people who replied. John Jorgensen jorgnsn@qucis.queensu.ca (613) 545 6784 Systems Programmer, Dept. of Computing Science, Queen's University
les@chinet.chi.il.us (Leslie Mikesell) (01/21/91)
In article <1060@quiddity.queensu.CA> jorgnsn@qucis.queensu.ca (John Jorgensen) writes: >On December 4, I asked what to do about two HP LaserJet IIIs with HP's >PostScript cartridge, which were hanging intermittently on perfectly >good PostScript jobs. The printers showed "PROCESSING DATA" >indefinitely, without any pages coming out. Occasionally I could take >the printers offline and reset them from the front panel, but usually >I couldn't get past "OFFLINE PENDING", and had to switch the printer >off to reset it. Most, but not all, of the hanging jobs were produced >by a DVI-to-PostScript translator called dvitps. The same jobs would >print fine when resubmitted. I don't think your summary covered the simple and likely posibility that the printer is returning data to the host and becomes deadlocked due to the host sending and XOFF to the printer. This will happen in the case where you have enabled XON/XOFF flow control from the host to the printer and do not read from the port often enough to collect the data returned. The printer will not continue until it has been able to output any data that may have been requested by commands embedded in the postscript file. To test for this, when the printer is apparently hung, just do a "cat </dev/ttyxx" using the appropriate port name. If that clears the condition, then the problem was the reverse flow control. To fix it, either eliminate the XON/XOFF from the host and allow the data to be discarded or fork a process to read continuously to avoid deadlock. Les Mikesell les@chinet.chi.il.us