[comp.sys.encore] Problems with Encore Annex server

jsloan@wright.EDU (John Sloan) (01/06/88)

We're having some problems with our Encore Annex terminal servers
talking to our SUN-3/280S timesharing system. We're hoping someone
can give us some advice on where to start troubleshooting. We'll
call Encore and/or Sun, but first we'd like to know where the problem
is likely to be so we know what questions to ask. Any help would be
appreciated. If its a case of RTFM, feel free to flame a bit, but
please provide a reference. BTW, other than this small glitch,
we're very happy with out Annex servers. I have paperwork on
my desk to purchase four more.

SYMPTOMS

While using vi on the SUN through an Annex, the terminal freezes up.
No amount of control keys of any flavor will free it. The status line
on the terminal indicates that the NO SCROLL key is NOT active.  We can
break back to the annex and open another terminal session, and kill our
previous incarnation. Meanwhile, users continue to access the SUN both
through the same Annex and another Annex without any problems.

The problem is fairly rare, showing up on the average less than once a
day.  It has occurred on both Annexes.  It has never been observed
while connecting to either our VAX 750 or 785 or our SUN-3/180S,
although the 380S is our main system and so just may be more likely to
exhibit the problem. It has never been reported to occur outside of vi.
It has never been reported while using the same model terminal on a
direct connect line.

It happened (twice :-( ) while editing this posting, which is unusual.

When you kill the previous session, you can recover the file as usual
with the -r switch. Killing vi is not enough, though, as the csh is
also hungup, so you must kill -9 that as well.

SPECIFICS

Terminal	WYSE 60 (although we believe that it has occurred
			on other makes and models, we don't have
			the evidence to back this up)
Annex		Annex-UX
		Software Rel 2.1
		Harward Rev 1.4
		ROM Rev 0305
Sun-3/280S	Vanilla SunOS 3.4
Command		rlogin

TROUBLESHOOTING

The only thing that looks significant (to me) is that when the problem
occurs, a "netstat" on the annex shows

Active Connections
Proto	Recv-Q	Send-Q	Local Address		Foreign Address	(state)
tcp	0	71	(192.26.92.245).750	odin.login	ESTABLISHED

where the Send-Q (here, 71) grows by one each time you hit a key. Hence,
it seems that the input is being queued but is not being sent. Still,
thr problem could be at either end.

On "odin", a netstat shows that the Recv-Q and Send-Q of the same socket
are empty (which is expected).

On the annex, a "netstat -i" shows 1 alignment error, 1 interface
reset, and 29 TX DMA Underruns, but these numbers do not grow when
the problem reoccurs, so it seems unlikely that they have anything
to do with it. Likewise with some error counts under "netstat -s".

If there is a way to take force a crash dump on the Annex we don't
know what it is, and besides would be reluctant to do so while other
users were on it being productive.

If anyone has had similar difficulties, or even better, has some notion
of how to further troubleshoot this, we'd appreciate hearing from you.
We don't have ethernet diagnostic equipment, but this has the smell of
a software problem, almost like an XON/XOFF problem where the XON back
to the annex is getting lost (nope, we have yet to find any evidence of
this).

Again, the problem could be at either end.

Thanks for any hints at all.

-- john

John Sloan                     Wright State University Research Center
jsloan@SPOTS.Wright.Edu       3171 Research Blvd., Kettering, OH 45420
...!cbosgd!wright!jsloan               (513) 259-1384   (513) 873-2491
Logic Disclaimer: belong(opinions,jsloan). belong(opinions,_):-!,fail.
-- 
John Sloan                     Wright State University Research Center
jsloan@SPOTS.Wright.Edu       3171 Research Blvd., Kettering, OH 45420
...!cbosgd!wright!jsloan               (513) 259-1384   (513) 873-2491
Logic Disclaimer: belong(opinions,jsloan). belong(opinions,_):-!,fail.

budd@bu-cs.BU.EDU (Philip Budne) (01/07/88)

Sun 3.4 has known TCP window problems.  I don't remember ever seeing
problems of this nature between our 280 and Annexen, but we did
between Sun 3.4 and WISCnet.  We now run 4.3 TCP code in our Suns.

Try using tcpdump to view the traffic between the two systems.

	Phil Budne, Boston University

grunwald@uiucdcsm.cs.uiuc.edu (01/08/88)

You can get SunOS 3.4.2 from Sun, which supposedly solves these problems.