matt@marge.math.binghamton.edu (matt brin) (08/26/89)
We are running cap50 on a sun 3/60 that is running sunos 4.0. Also running is KIP 6/88 on the sun. We have a Fastpath 2 box upgraded to a 4 with PROM version 4.2 that was downloaded with K-Star 5.03. We put an Apple LaserWriter (not a Plus) on the appletalk side of the box and turned it all on. It handles short jobs just fine, but behaves oddly on printouts of more than ten pages. After about ten pages of a long printout appear (say pages 17 through 7), there is a longish pause, and the printout starts all over again with the "first" page (page 17 of this page reversed example) and promisses to loop infinitely (long term behavior not checked out). Is this the famous idle bug? How do we deal with it? (We compiled cap with the option to slow down transmission, in case this is relevant.) Replies can be emailed if you think that your answer is not of universal interest. matt brin / math. dept / SUNY / Binghamton, NY 13901 matt@marge.math.binghamton.edu INTERNET FAC119 at BINGVAXB BITNET MBRIN at BINGVAXB BITNET
tjh+@ANDREW.CMU.EDU (Tom Holodnik) (09/01/89)
Matt- I saw the same symptoms you're seeing. What we did to correct the situation was to upgrade the FastPath-2 to a FastPath-4 (the FastPath-4 has better performance), and to break up the network into two segments served by two Kinetics gateways. It's been some time since we dealt with this, and I hesitate to give you anything less than the correct answer. The best way for you to determine what is happenning is to turn on the various levels of protocol debugging (esp. ATP) within CAP, when you call the papif filter. I saw that ATP transaction release packets weren't being received at the spooler, and that the socket on the laserwriter had been closed. My guess was that the PAP tickle timer had expired, and the laserwriter closed the connection. I saw those symptoms during times when the network was heavily stressed (typically between 10am and 4pm), while printing was fully reliable during off peak hours. Making the PAP flow quantum smaller made the problem worse, since more ATP control (TrReq, TrResp, and TrRel) packets were required for the same amount of data. Since the probability of dropping packets is equal for all packets, the fewer packets sent the better. The real trouble was that we were dropping packets, not just control packets. I thought the trouble wasn't that there was anything wrong with the hardware of the software, but that something was misconfigured and outside of the proper specification. If there are other services on your network that are suffering, you should consider them, also. That was why we scaled back the size of our network. Hope this helps, Tom