[comp.protocols.appletalk] idle bug?

matt@marge.math.binghamton.edu (matt brin) (08/26/89)

We are running cap50 on a sun 3/60 that is running sunos 4.0.  Also running
is KIP 6/88 on the sun.  We have a Fastpath 2 box upgraded to a 4 with PROM
version 4.2 that was downloaded with K-Star 5.03.  

We put an Apple LaserWriter (not a Plus) on the appletalk side of the box
and turned it all on.  It handles short jobs just fine, but behaves oddly on
printouts of more than ten pages.  After about ten pages of a long printout
appear (say pages 17 through 7), there is a longish pause, and the printout
starts all over again with the "first" page (page 17 of this page reversed
example) and promisses to loop infinitely (long term behavior not checked
out).  

Is this the famous idle bug?  How do we deal with it?  (We compiled cap with
the option to slow down transmission, in case this is relevant.)

Replies can be emailed if you think that your answer is not of universal
interest.

matt brin / math. dept / SUNY / Binghamton, NY 13901
matt@marge.math.binghamton.edu      INTERNET
FAC119 at BINGVAXB                  BITNET
MBRIN  at BINGVAXB                  BITNET

tjh+@ANDREW.CMU.EDU (Tom Holodnik) (09/01/89)

Matt-
	I saw the same symptoms you're seeing. What we did to correct the
situation was to upgrade the FastPath-2 to a FastPath-4 (the FastPath-4
has better performance), and to break up the network into two segments
served by two Kinetics gateways. 
	It's been some time since we dealt with this, and I hesitate to give
you anything less than the correct answer. The best way for you to
determine what is happenning is to turn on the various levels of
protocol debugging (esp. ATP) within CAP, when you call the papif
filter.  I saw that ATP transaction release packets weren't being
received at the spooler, and that the socket on the laserwriter had been
closed. My guess was that the PAP tickle timer had expired, and the
laserwriter closed the connection. 
	I saw those symptoms during times when the network was heavily stressed
(typically between 10am and 4pm), while printing was fully reliable
during off peak hours. Making the PAP flow quantum smaller made the
problem worse, since more ATP control (TrReq, TrResp, and TrRel) packets
were required for the same amount of data. Since the probability of
dropping packets is  equal for all packets, the fewer packets sent the
better.
	The real trouble was that we were dropping packets, not just control
packets. I thought the trouble wasn't that there was anything wrong with
the hardware of the software, but that something was misconfigured and
outside of the proper specification. If there are other services on your
network that are suffering, you should consider them, also. That was why
we scaled back the size of our network. 

Hope this helps,
Tom