[comp.dcom.telecom] Documentation Wanted on January '90 AT&T Outage

Marc Riese <riese@litsun.epfl.ch> (04/11/91)

I read in {IEEE Institute} (the newspaper that comes with Spectrum)
March '91:

 "On Jan. 15, 1990, a flaw in a new version of software interrupted 
  long-distance, international, and toll-free 1-800 calls for nine 
  hours -   AT&T's most extensive service disruption in its history."

(This is probably old news for most readers - apologies.)

Can anyone tell me more details about this? Is there a public report
about it?

Thanks,

Marc Riese

Carl Wright <wright@ais.org> (04/14/91)

The best article I've seen on AT&T outage was in {Science News}. Try
your library and the index for the magazine.


Carl Wright                     | Lynn-Arthur Associates, Inc.
Internet: wright@ais.org        | 2350 Green Rd., #160
Voice: 1 313 995 5590 EST       | Ann Arbor, MI 48105

crocker@uunet.uu.net> (04/16/91)

WARNING: Personal comments follow and do not indicate the feelings of
my employer with reference to this incident.

There was some publications around the time of the incident indicating
that the problem was a missing break statement in some C code in the
4ESS software.  It was indicated that the generic was installed in the
offending office in December, was up and running with "no" problems
for three weeks. I know more about this, but am bound by agreements to
not disclose it.

The immediate (kneejerk?) reaction by AT&T management was to insist on
everyone at Bell Labs taking a course in C programming, and find a
tool that would highlight missing break statements. Nothing like
shooting the message carrier :->.--


Ron Crocker
Motorola Radio-Telephone Systems Group, Cellular Infrastructure Group
(708) 632-4752 [FAX: (708) 632-4430]  crocker@mot.com or uunet!motcid!crocker

Bryan Richardson <richarbm@mentor.cc.purdue.edu> (04/23/91)

In article <telecom11.296.2@eecs.nwu.edu> motcid!crocker@uunet.uu.net
(Ronald T. Crocker) writes:

> There was some publications around the time of the incident indicating
> that the problem was a missing break statement in some C code in the
> 4ESS software.  It was indicated that the generic was installed in the
> offending office in December, was up and running with "no" problems
> for three weeks. I know more about this, but am bound by agreements to
> not disclose it.

This is basically correct at the most detailed level.  There were a
number of conditions which occurred in the network that day prior to
the exposure of the missing break statement, including hardware
failures.

> The immediate (kneejerk?) reaction by AT&T management was to insist on
> everyone at Bell Labs taking a course in C programming, and find a
> tool that would highlight missing break statements. Nothing like
> shooting the message carrier :->.--

As a member of the 4 ESS development team, I can concretely say that
this is an Urban Legend in the making.  There are always efforts to
improve product quality, and these naturally are intensified after
incidents such as these.  However, there was no mass mandatory
enrollment in C programming courses, at least as of this writing :).


Bryan Richardson     richarbm@mentor.cc.purdue.edu
AT&T Bell Laboratories and, for 1991, Purdue University
Disclaimer:  Neither AT&T nor Purdue are responsible for my opinions.