[mod.protocols.tcp-ip] Arpanet outage

malis@CCS.BBN.COM (Andrew Malis) (12/15/86)

Charles,

What happened is as follows:

At 1:11 AM EST on Friday, AT&T suffered a fiber optics cable
break between Newark NJ and White Plains NY.  They happen to have
routed seven ARPANET trunks through that one fiber optics cable.
When the cable was cut, all seven trunks were lost, and the PSNs
in the northeast were cut off from the rest of the network.
Service was restored by AT&T at 12:12.

The MILNET also suffered some trunk outages, but has more
reduncancy, so it was not partitioned.

Regards,
Andy Malis

LYNCH@A.ISI.EDU.UUCP (12/16/86)

Andy,  How on earth does it come to happen that 7 "trunks" are
"routed" through one fiber optics cable?  My idea of a cable 
encompasses both some dedicated bandwidth and some physical
isolation.  How can a network planner ever be sure of "redundancy"
if the providers of moving bits do these kinds of things to their
customers?

Dan
-------

haverty@CCV.BBN.COM.UUCP (12/16/86)

Dan,

It's misleading to think that you are ordering a "trunk" from a
communications supplier.  What you are buying is a plug at one
site through which you can pass bits, which appear by some magic
at the plug you have bought at the other site.  Assuming that
there is a physical wire between the two with any particular
characteristics other than what is specified in the service
offering (e.g., BER, speed, conditioning) is a dangerous
practice.

The nice network maps we all draw are topological, not physical.
We've often deduced physical characteristics from observed
behavior, and seen this kind of thing in many networks.  I
remember one in particular that had a microwave "sweeper" on a
tower, which swept a beam in a circle to hit N other microwave
stations around the horizon; the observed effect of this was a
propagation delay of about 100 msec., which is far too short for
any normal satellite trunk, and far too long for any normal
terrestrial circuit.  I also remember a backhoe in a farmer's
field in Illinois which dug up N of our carefully redundantized
trunks with a single flip of the scoop.

I think in most cases even if you figure out something about the
physical implementation, there is no guarantee that it will be
the same next week.   Vendors do offer some options that you can
specify, usually at extra cost, like a guaranteed terrestrial
routing to control delay; I think you can also specify separate
physical routes for different circuits in some cases.

Jack

pogran@CCQ.BBN.COM (Ken Pogran) (12/16/86)

Dan,

Your question to Andy, "How on earth does it come to happen that
7 'trunks' are 'routed' through one fiber optics cable" is more
properly addressed to the common carriers whose circuits the
ARPANET uses, rather than to the packet switching folks.

Here we are in the world of circuits leased from common carriers,
where economies of scale (for the carriers!) imply very high
degrees of multiplexing.  As the customer of a common carrier,
you specify the end points that you'd like for the circuit, and
the carrier routes it as he sees fit.  This is a personal
opinion, and not a BBNCC official position, but I think it's safe
to say that without spending a lot of extra money, and citing
critical national defense needs, it's going to be hard to get a
carrier to promise -- and achieve!  -- diverse physical routings
for a given set of leased circuits.  I would also venture the
opinion that there are lots of places in the U. S. where there's
only one physical transmission system coming into the area that
can provide the 56 Kb/s Digital Data Service that the ARPANET (and
MILNET, and ...) uses.

An implication of this is that almost any wide-area network
(doesn't matter whose, or what the technology is) is going to be
somewhat more vulnerable to having nodes isolated than its
logical map would suggest.

In fairness to the common carriers (are there any in the
audience?), the higher the degree of multiplexing, the more
well-protected the carrier's facilities are, and the more
attention is paid to issues of automatic backup (carriers call
this "protection") and longer-term rerouting of circuits when
there's an outage (carriers call this "restoration").  So an
outage of the type that's been discussed ought to be a very
low-prob event.  Kind of like wide-spread power failures ...

Hope this discussion helps.

Ken Pogran

malis@CCS.BBN.COM (Andrew Malis) (12/16/86)

Dan,

One additional point - I believe most (if not all) of the
affected trunks have been in service since before AT&T started
using fiber optics.  AT&T has obviously been rerouting their
existing circuits as cheaper transmission paths become available.
For some of the older ARPANET/MILNET trunks, I'm sure they've
seen the complete transition from wires to microwave to fiber
(and who knows what else).

Andy

LYNCH@A.ISI.EDU (Dan Lynch) (12/17/86)

Ken (and the others who have jumped into this),
Wow.  I guess this surfaced an issue that many of us had
taken for granted -- that those who are responsible for 
deploying the Arpanet and Milnet (and who know what else) have
been keeping the "diversity of routing" high enough to ensure
"reliability/survivability" of data links during even normal
times.  (There wil always be a farmer in Illinois who digs before
asking.)  Anyway,  here's hoping we can benefit from this recent
minor debacle.  
One additional query of those in the know:  when the service was
restored did things just start to work again or did some manual
intervention get packets routing on their merry way?

Dan

PS.  I really like to have these "system level" discussions whenever
we are "lucky" enough to have serious disruptions of the underlying
technology.  Thye are rare events and we think we have designed
our methods to deal with them.  And we rarely have the guts to
blast ourselves out of the water to "test" them.
-------

malis@CCS.BBN.COM (Andrew Malis) (12/17/86)

Dan,

To answer your question: when service was restored, the PSNs
automatically brought the trunks back up and reconnected the
network together.  No manual intervention required.

Andy

P.S. Here's another good topic to rant and rave about:

Don't you hate it when hosts keep messages sitting in queues
for days, and conversations get out of sync?  Take, for example,
this message we all just received this morning:

Received: from SRI-NIC.ARPA by CCS.BBN.COM ; 17 Dec 86 08:48:38 EST
Received: from vax.darpa.mil by SRI-NIC.ARPA with TCP; Wed 17 Dec 86 00:03:07-PST
Received: by vax.darpa.mil (4.12/4.7)
	id AA19852; Mon, 15 Dec 86 05:42:39 est
Date: Mon 15 Dec 86 05:42:30-EST
From: Dennis G. Perry <PERRY@VAX.DARPA.MIL>
Subject: Re: Arpanet outage

It took about 42 hours for vax.darpa.mil to send it to
SRI-NIC.ARPA, and another 9 hours to make it to me.

mike@BRL.ARPA.UUCP (12/19/86)

Since nobody from DCA has spoken up yet, I'll add a few comments.
As the MILNET is being rebuilt using the "new" IMP packaging
(with link encryption capability), some (most?) of the data
circuits are being moved to DCTN (?Defense Computer Telecomunications
Network?), which is an ISDN-oriented base of somewhat switchable
circuit capabilities.  I believe DCTN has a phased implementation
plan, probably with automatic switching happening much later.

My general impression is that DCA and Army Signal Corps (now the
"Information Systems Command") both tend to do a good to excellent
job implementing systems designed around traditional concepts
such as point-to-point circuits, so DCTN is likely to be
a big win.  In addition, I suspect that routing of DCTN circuits
is likely to be carefully controlled to prevent excessive
bundling onto single transmission links, precisely for survivability.
(Blind faith here).

What we have seen of DCTN so far is a T1 line terminating in our
Post's Central Office at a D4 channel-bank, with a bunch (7?)
of 56k DDS links from there to the location of the MILNET IMP.
This gives much better signal quality than previous arrangements
where the DDS lines traveled over 5 miles of wire to the town CO.
It does not provide any additional reliability, as everything still
travels over the big black cable from our CO to the town CO.
This cable is especially attractive to heavy earthmoving equipment,
and is neutralized several times each year.  Presumably when the
T1 gets to the town CO, it terminates in something resembling
a circuit switch or patch pannel or something (behind another D4
channel bank, of course), so that some alternate routing capability
exists at that point.  Of course, it might be that the T1 gets
zipped through a bunch of repeaters to some regional circuit
switch, extending our line of vulnerability a good long way.

Personally, I find the concept of layering a packet switching network
on top of a switchable circuit network rather amusing, but
quite realistic and practical.

More grist for the Rumor Mill, may it grind long and fine...
	Best,
	 -MIKE