jsloan@wright.EDU (John Sloan) (01/06/88)
We're having some problems with our Encore Annex terminal servers talking to our SUN-3/280S timesharing system. We're hoping someone can give us some advice on where to start troubleshooting. We'll call Encore and/or Sun, but first we'd like to know where the problem is likely to be so we know what questions to ask. Any help would be appreciated. If its a case of RTFM, feel free to flame a bit, but please provide a reference. BTW, other than this small glitch, we're very happy with out Annex servers. I have paperwork on my desk to purchase four more. SYMPTOMS While using vi on the SUN through an Annex, the terminal freezes up. No amount of control keys of any flavor will free it. The status line on the terminal indicates that the NO SCROLL key is NOT active. We can break back to the annex and open another terminal session, and kill our previous incarnation. Meanwhile, users continue to access the SUN both through the same Annex and another Annex without any problems. The problem is fairly rare, showing up on the average less than once a day. It has occurred on both Annexes. It has never been observed while connecting to either our VAX 750 or 785 or our SUN-3/180S, although the 380S is our main system and so just may be more likely to exhibit the problem. It has never been reported to occur outside of vi. It has never been reported while using the same model terminal on a direct connect line. It happened (twice :-( ) while editing this posting, which is unusual. When you kill the previous session, you can recover the file as usual with the -r switch. Killing vi is not enough, though, as the csh is also hungup, so you must kill -9 that as well. SPECIFICS Terminal WYSE 60 (although we believe that it has occurred on other makes and models, we don't have the evidence to back this up) Annex Annex-UX Software Rel 2.1 Harward Rev 1.4 ROM Rev 0305 Sun-3/280S Vanilla SunOS 3.4 Command rlogin TROUBLESHOOTING The only thing that looks significant (to me) is that when the problem occurs, a "netstat" on the annex shows Active Connections Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp 0 71 (192.26.92.245).750 odin.login ESTABLISHED where the Send-Q (here, 71) grows by one each time you hit a key. Hence, it seems that the input is being queued but is not being sent. Still, thr problem could be at either end. On "odin", a netstat shows that the Recv-Q and Send-Q of the same socket are empty (which is expected). On the annex, a "netstat -i" shows 1 alignment error, 1 interface reset, and 29 TX DMA Underruns, but these numbers do not grow when the problem reoccurs, so it seems unlikely that they have anything to do with it. Likewise with some error counts under "netstat -s". If there is a way to take force a crash dump on the Annex we don't know what it is, and besides would be reluctant to do so while other users were on it being productive. If anyone has had similar difficulties, or even better, has some notion of how to further troubleshoot this, we'd appreciate hearing from you. We don't have ethernet diagnostic equipment, but this has the smell of a software problem, almost like an XON/XOFF problem where the XON back to the annex is getting lost (nope, we have yet to find any evidence of this). Again, the problem could be at either end. Thanks for any hints at all. -- john John Sloan Wright State University Research Center jsloan@SPOTS.Wright.Edu 3171 Research Blvd., Kettering, OH 45420 ...!cbosgd!wright!jsloan (513) 259-1384 (513) 873-2491 Logic Disclaimer: belong(opinions,jsloan). belong(opinions,_):-!,fail. -- John Sloan Wright State University Research Center jsloan@SPOTS.Wright.Edu 3171 Research Blvd., Kettering, OH 45420 ...!cbosgd!wright!jsloan (513) 259-1384 (513) 873-2491 Logic Disclaimer: belong(opinions,jsloan). belong(opinions,_):-!,fail.
budd@bu-cs.BU.EDU (Philip Budne) (01/07/88)
Sun 3.4 has known TCP window problems. I don't remember ever seeing problems of this nature between our 280 and Annexen, but we did between Sun 3.4 and WISCnet. We now run 4.3 TCP code in our Suns. Try using tcpdump to view the traffic between the two systems. Phil Budne, Boston University