[comp.protocols.tcp-ip] problems with IMP connection hanging

hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (10/25/87)

For the last several weeks (since the upgrade in the software
in the Arpanet PSN?) we have been having trouble with our IMP
connection hanging.  This problem has also been observed by a
site in California.  They have been told by NOC that they are
the only people with such problems, so they should assume it is
bad hardware.  In order to help prevent us and NOC from going
off on a wild goose-chase of this nature, I'd like to hear 
whether any other people are having similar problems.  The
symptom we see is that no traffic is flowing.  I confess that
in the past I have not been watching carefully enough to observe
the states of all of the lights, but I believe everything is
normal except that we are simply not seeing ready for next
bit from the IMP.  Our configuration is 
  ECU with local connection to IMP (Arpanet IMP 89)
  ECU at our end connected to Cisco router, using ACC Multibus 1822 card
The configuration has been troublefree for quite some time before
this set of problems began.  We have been able to reset it at times
by pushing the reset button on the ECU at our end, or by issuing
a softwre reset to the ACC card (simulating more or less what happens
when the machine powers up).  Once we had to have NOC intervene.
They ran a loopback test on the interface, and when they went back
into normal mode, things were fine.  I believe the other site that
is seeing these problems is also using a pair of ECU's, and that
the general symptoms are similar.  Is anyone else seeing a related
problem?

rob@PRESTO.IG.COM (Rob Liebschutz) (10/25/87)

We are also having a problem with our PSN connection that began
immediately after the PSN upgrade.

Configuration: LSI 11/23 core gateway with 1822 interface and ACC
	Robustness card (for booting via Arpanet from the NOC)
	Connected to PSN 32 with ECUs.

The symptoms are that the ECU RFNB light goes out on the local ECU
(gateway end) and the gateway says that the interface is down.  If the
gateway crashes (which it has done several times since the upgrade),
it can't be loaded.  The last time this happened, we were able to get
things working again, when the NOC looped back the interface,
downloaded the gateway, and then looped back the interface again.  The
second loopback is necessary because the interface goes down right
after the gateway comes up.  Resetting the local ECU does not help.

swb@DEVVAX.TN.CORNELL.EDU (Scott Brim) (10/26/87)

lkj

AI.CLIVE@MCC.COM (Clive Dawson) (10/26/87)

Our DEC-2065 host (MCC.COM) also uses ECU's to connect to the UTexas
PSN.  We've had a trouble-free connection for almost 3 years now.
We've had to deal with a hand-shaking problem, because apparently
the ECU will not raise the IMP ready signal until it sees host ready,
and the TOPS-20 operating system will not raise host ready until it
sees IMP ready.  So we run a small program called AN20-HACK at system
startup time, which uses a DATAO to force host ready on, and all
goes well from then on.  If the PSN happens to go down, however,
then TOPS-20 drops host ready, and AN20-HACK must be run again
when the PSN comes back up.  Otherwise the connection will stay
down for hours until somebody notices.

To solve this problem, I created a small batch job that runs
every 30 minutes to check the status of the net.  If "INFO ARPA"
shows that the net is down, then AN20-HACK is run to try and
get things started again.

All of the above is background info to get around to the real
point of this message.  Last Friday I noticed that the H/I RFNB
light on our ECU was out.  I ran AN20-HACK, which is the normal
way to cure this, but noticed that the light stayed out.  Then
I noticed that "INFO ARPA" reported that the network was UP!
(Perhaps it simply checks "host ready"?)  This is the first time
I had ever noticed that the system reported that the net was
up when in fact it was down.  I pushed RESET on the ECU, ran
AN20-HACK once more, and all was back to normal.

The problem has not recurred since.  I don't know if this is
in any way related to the problem you reported, but I suspect
it might be.  In any case, I hope this is of some help.

CLive
-------

hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (10/26/87)

Yes, this is the precise symptoms we are seeing with our
cisco gateways, down to which lights are on and what is
done to fix it.  If our experience is any indication,
you'll be seeing more of it.

chris@TRANTOR.UMD.EDU (Chris Torek) (10/27/87)

As of 2355 EST, our ECU link to MILNET PSN (nee IMP) #57 stopped
working.  When I went to check on it, the STOP light on our local
ECU was on; I reset it, but we have yet to reestablish communications.
I have no idea whether this is related to the PSN upgrades.

Chris

Mills@UDEL.EDU (10/29/87)

Charles,

For completeness, I offer the observation that the fuzzball gateways attached
to ARPANET apparently do not have this problem. But then, they are pretty
crude gizmos and don't even count RFNMs. Interesting that they don't count
RFNMs...

Dave

steve@BRILLIG.UMD.EDU (Steve D. Miller) (10/29/87)

   More on our PSN problems:  it seems that our ECU link to MILNET PSN 57
was loose at the PSN end; tightening the cable fixed the problem, at least
temporarily.  A short while later, our link was again down.  I fiddled
various things under the direction of the NOC, but nothing seemed to help.
They told me it was a software problem on our end (and I argued loudly,
'cause the software hasn't changed in an eternity), and they finally
suggested going to our backup ECU.  We plugged in and powered up, and all
seems well now.

   Our PSN problems were likely caused by hardware problems with the first
ECU, but things behaved so strangely that I won't be convinced of that until
some hardware guy verifies the ECU failure.  Strange...  but probably
unrelated to the problems others are having.

	-Steve