[comp.protocols.tcp-ip] Recent AMES Internet outage

medin@ORION.ARPA.UUCP (06/17/87)

Something here at Ames happened recently and I thought it of enough worth that
I should bring it up here for some comments.  Last week we at Ames had a
MILNET PSN upgraded and moved to a new location.  We are now a standard 
configuration PSN site.  In the process of the move, the gateway on port 0
(ames-gw.arpa - also known as ames-viking.arpa) was removed from port 0.
That machine is a Vax, and the new facility is out of easy 1822DH 
range of that machine (we were replacing it with a dedicated gateway
anyways).  We had 2 other gateways up however and running EGP and 
advertising the local net to the Core.  We normally have 3 gateways
running (including viking), but 2 seemed to be quite adequate for
an interim configuration.  

After we brought up the 2 other gateways, we noticed that we couldn't
talk to the NIC.  At first we thought it was down, but later checks
proved it to be up.  We also noticed that we couldn't talk to certain
other sites.  All sites running EGP normally worked fine though (except
for certain UMD machines).  After some checking, it appeared to be
tracked down to those machines using a default route to some core 
gateway that issued a redirect to viking, and even though viking
was dead, still continued to use that route until the system was rebooted.
Thus, even though we had 2 gateways up and talking to the core, several
sites still couldn't talk to us, most notably the NIC.  It seems to
me that if you get a host dead message from the PSN that that
route should be deleted, and probably that all redirected routes
should be timed out after awhile.  I believe 4.3 BSD will delete
a route if TCP sees a connection starting to time out.  

Anyways, the fix was to accelerate the deployment of a dedicated gateway
on port 0, and after bringing up a Proteon p4200 there, we could
suddenly talk to everyone again (the NIC, brilling.umd.edu (where
laser-lovers mail comes from), etc...).  This certainly doesn't strike me
as a very robust system.  Granted EGP has it's problems, but at least
it doesn't suffer from this one...  Now all we need to do is to file
a TSR to get rid of ames-viking.arpa's affliation to PSN port 0 and
we're done...  Sigh.


						Thanks,
						  Milo

PS The old C30/E PSN was bought and paid for by NASA, and has a
NASA property sticker on it.  Since we have no further need for
it, we plan to surplus it eventually.  NASA paid over $70,000
for it some years ago, and if someone who wanted one would be willing
trade us some Sun's (or something similiar) for it we could probably
arrange something.  I normally wouldn't post a want-ad type of message
to the net, but since this is official U.S. Gov't business, it seemed
appropriate.