malis@CCS.BBN.COM (Andrew Malis) (12/15/86)
Charles, What happened is as follows: At 1:11 AM EST on Friday, AT&T suffered a fiber optics cable break between Newark NJ and White Plains NY. They happen to have routed seven ARPANET trunks through that one fiber optics cable. When the cable was cut, all seven trunks were lost, and the PSNs in the northeast were cut off from the rest of the network. Service was restored by AT&T at 12:12. The MILNET also suffered some trunk outages, but has more reduncancy, so it was not partitioned. Regards, Andy Malis
LYNCH@A.ISI.EDU.UUCP (12/16/86)
Andy, How on earth does it come to happen that 7 "trunks" are "routed" through one fiber optics cable? My idea of a cable encompasses both some dedicated bandwidth and some physical isolation. How can a network planner ever be sure of "redundancy" if the providers of moving bits do these kinds of things to their customers? Dan -------
haverty@CCV.BBN.COM.UUCP (12/16/86)
Dan, It's misleading to think that you are ordering a "trunk" from a communications supplier. What you are buying is a plug at one site through which you can pass bits, which appear by some magic at the plug you have bought at the other site. Assuming that there is a physical wire between the two with any particular characteristics other than what is specified in the service offering (e.g., BER, speed, conditioning) is a dangerous practice. The nice network maps we all draw are topological, not physical. We've often deduced physical characteristics from observed behavior, and seen this kind of thing in many networks. I remember one in particular that had a microwave "sweeper" on a tower, which swept a beam in a circle to hit N other microwave stations around the horizon; the observed effect of this was a propagation delay of about 100 msec., which is far too short for any normal satellite trunk, and far too long for any normal terrestrial circuit. I also remember a backhoe in a farmer's field in Illinois which dug up N of our carefully redundantized trunks with a single flip of the scoop. I think in most cases even if you figure out something about the physical implementation, there is no guarantee that it will be the same next week. Vendors do offer some options that you can specify, usually at extra cost, like a guaranteed terrestrial routing to control delay; I think you can also specify separate physical routes for different circuits in some cases. Jack
pogran@CCQ.BBN.COM (Ken Pogran) (12/16/86)
Dan, Your question to Andy, "How on earth does it come to happen that 7 'trunks' are 'routed' through one fiber optics cable" is more properly addressed to the common carriers whose circuits the ARPANET uses, rather than to the packet switching folks. Here we are in the world of circuits leased from common carriers, where economies of scale (for the carriers!) imply very high degrees of multiplexing. As the customer of a common carrier, you specify the end points that you'd like for the circuit, and the carrier routes it as he sees fit. This is a personal opinion, and not a BBNCC official position, but I think it's safe to say that without spending a lot of extra money, and citing critical national defense needs, it's going to be hard to get a carrier to promise -- and achieve! -- diverse physical routings for a given set of leased circuits. I would also venture the opinion that there are lots of places in the U. S. where there's only one physical transmission system coming into the area that can provide the 56 Kb/s Digital Data Service that the ARPANET (and MILNET, and ...) uses. An implication of this is that almost any wide-area network (doesn't matter whose, or what the technology is) is going to be somewhat more vulnerable to having nodes isolated than its logical map would suggest. In fairness to the common carriers (are there any in the audience?), the higher the degree of multiplexing, the more well-protected the carrier's facilities are, and the more attention is paid to issues of automatic backup (carriers call this "protection") and longer-term rerouting of circuits when there's an outage (carriers call this "restoration"). So an outage of the type that's been discussed ought to be a very low-prob event. Kind of like wide-spread power failures ... Hope this discussion helps. Ken Pogran
malis@CCS.BBN.COM (Andrew Malis) (12/16/86)
Dan, One additional point - I believe most (if not all) of the affected trunks have been in service since before AT&T started using fiber optics. AT&T has obviously been rerouting their existing circuits as cheaper transmission paths become available. For some of the older ARPANET/MILNET trunks, I'm sure they've seen the complete transition from wires to microwave to fiber (and who knows what else). Andy
LYNCH@A.ISI.EDU (Dan Lynch) (12/17/86)
Ken (and the others who have jumped into this), Wow. I guess this surfaced an issue that many of us had taken for granted -- that those who are responsible for deploying the Arpanet and Milnet (and who know what else) have been keeping the "diversity of routing" high enough to ensure "reliability/survivability" of data links during even normal times. (There wil always be a farmer in Illinois who digs before asking.) Anyway, here's hoping we can benefit from this recent minor debacle. One additional query of those in the know: when the service was restored did things just start to work again or did some manual intervention get packets routing on their merry way? Dan PS. I really like to have these "system level" discussions whenever we are "lucky" enough to have serious disruptions of the underlying technology. Thye are rare events and we think we have designed our methods to deal with them. And we rarely have the guts to blast ourselves out of the water to "test" them. -------
malis@CCS.BBN.COM (Andrew Malis) (12/17/86)
Dan, To answer your question: when service was restored, the PSNs automatically brought the trunks back up and reconnected the network together. No manual intervention required. Andy P.S. Here's another good topic to rant and rave about: Don't you hate it when hosts keep messages sitting in queues for days, and conversations get out of sync? Take, for example, this message we all just received this morning: Received: from SRI-NIC.ARPA by CCS.BBN.COM ; 17 Dec 86 08:48:38 EST Received: from vax.darpa.mil by SRI-NIC.ARPA with TCP; Wed 17 Dec 86 00:03:07-PST Received: by vax.darpa.mil (4.12/4.7) id AA19852; Mon, 15 Dec 86 05:42:39 est Date: Mon 15 Dec 86 05:42:30-EST From: Dennis G. Perry <PERRY@VAX.DARPA.MIL> Subject: Re: Arpanet outage It took about 42 hours for vax.darpa.mil to send it to SRI-NIC.ARPA, and another 9 hours to make it to me.
mike@BRL.ARPA.UUCP (12/19/86)
Since nobody from DCA has spoken up yet, I'll add a few comments. As the MILNET is being rebuilt using the "new" IMP packaging (with link encryption capability), some (most?) of the data circuits are being moved to DCTN (?Defense Computer Telecomunications Network?), which is an ISDN-oriented base of somewhat switchable circuit capabilities. I believe DCTN has a phased implementation plan, probably with automatic switching happening much later. My general impression is that DCA and Army Signal Corps (now the "Information Systems Command") both tend to do a good to excellent job implementing systems designed around traditional concepts such as point-to-point circuits, so DCTN is likely to be a big win. In addition, I suspect that routing of DCTN circuits is likely to be carefully controlled to prevent excessive bundling onto single transmission links, precisely for survivability. (Blind faith here). What we have seen of DCTN so far is a T1 line terminating in our Post's Central Office at a D4 channel-bank, with a bunch (7?) of 56k DDS links from there to the location of the MILNET IMP. This gives much better signal quality than previous arrangements where the DDS lines traveled over 5 miles of wire to the town CO. It does not provide any additional reliability, as everything still travels over the big black cable from our CO to the town CO. This cable is especially attractive to heavy earthmoving equipment, and is neutralized several times each year. Presumably when the T1 gets to the town CO, it terminates in something resembling a circuit switch or patch pannel or something (behind another D4 channel bank, of course), so that some alternate routing capability exists at that point. Of course, it might be that the T1 gets zipped through a bunch of repeaters to some regional circuit switch, extending our line of vulnerability a good long way. Personally, I find the concept of layering a packet switching network on top of a switchable circuit network rather amusing, but quite realistic and practical. More grist for the Rumor Mill, may it grind long and fine... Best, -MIKE