[comp.dcom.telecom] SPECIAL REPORT: St. Louis Phone Outage

telecom@eecs.nwu.edu (TELECOM Moderator) (06/10/91)

[Moderator's Note: Here is a special report sent to the Digest by Brad
Hicks which discusses the major service outage in St. Louis last week.
It was too large for a regular issue of the Digest. PAT]

  Date: 09 Jun 91 15:57:19 EDT
  From: "76012,300 Brad Hicks" <76012.300@compuserve.com>
  Subject: SWBT Drops 2,800 Data Lines


{St. Louis Post-Dispatch} , 6/4/91, pp. 1A, 7A: 

COMPUTER FAILURE SHUTS DOWN AUTOMATIC TELLERS ACROSS AREA 

by Jim Gallagher Of the Post-Dispatch Staff

Scores of automatic teller machines across St. Louis have been shut
down for four days because of a computer failure at Southwestern Bell.
The glitch also slowed the transfer of billions of dollars on
Friday through the Federal Reserve Bank of St. Louis and afected dozens
of other businesses that send information by computer.  A Southwestern
Bell spokesman said Monday that crews were working aroundthe clock to
put 2,800 data transmission lines back in service.

The failure hit Friday morning.  It immediately froze 55 automatic
teller machines belonging to 18 St. Louis credit unions with 135,000
customers.

An additional 26 automatic tellers owned by Mercantile Bank and
several owned by United Postal Savings Association also went blank.
Tellers in some branches lost their links to the main computers at the
credit unions, Mercantile and United Postal.  The branches remained
open using backup systems, although the glitch is slowing business,
officials say.  "It's been a major pain," said Bill Humpfer, president
of St. Louis Teachers Credit Union, where automatic tellers are used
50,000 times a month.

Southwestern Bell said the problem was in a "high capacity digital
cross-connection system" in the 2600 block of Olive Street.  The
machine shuttles computer calls between stations.  Phone company
officials don't know what caused the problem but sabotage is not
suspected, said Bell spokesman David Martin.  Engineers got the
computer working again on Saturday and began the slow process of
reprogramming it.  By Monday night, Bell had restored 1,500 lines --
mostly high-speed lines used by businesses -- but the automatic
tellers remained down.

The Federal Reserve Bank of St.  Louis -- which wires $15 billion a
day between banks around the country -- switched to a backup system
Friday to keep the money moving.  "We got all our work done, but it
delayed things," said a Fed spokesman. The Fed was still using the
backup system Monday afternoon.  Besides financial institutions, the
system serves scores of businesses that send information by computer.

AT&T leases about 280 of the Bell lines for its own long-distance
customers.  Most were back in service Monday, an AT&T spokesman said.

The problem hit on a very busy day for banks.  It was a "double
payday" -- a Friday on the last day of the month.  Many customers who
could not used automatic tellers lined up in front of human tellers
instead.  Lines were longer than normal, although officials said the
extra crowd caused few problems.  But credit union officials said the
breakdown was costing them money. Customers who can't use their own
credit union's automatic tellers may use acompetitor's instead.  When
that happens, the credit union pays a fee to the competitor.

 "We're starting to talk megabucks," Humpfer said.  Credit unions
affected are part of a single automatic teller machine processing
system.  The credit unions include Anheuser-Busch, Aerospace,
Educational Emerson, Gateway Telco, Telephone, First Community,
WestCommunity, RAC, First American, Sunset, St.  Louis Federal Center,
St. Andrew,Arsenal, Wetterau, Victory, South Community and Electro.

                         ----------

{St. Louis Post-Dispatch}, 6/5/91, pp. 1C, 3C: 

CRASH! SW Bell Computer Expected Back on Line This Morning 

by Jim Gallagher Of the Post-Dispatch Staff 

The computer breakdown that paralyzed dozens of automatic teller
machines and apparently frustrated crowds of horse-racing fans should
be fixed by this morning, Southwestern Bell Telephone Co. says.  The
phone company was methodically reactivating 2,800 high-speed lines
that carry electronic data around St. Louis and beyond.  "We're
finishing it up," phone company spokesman David Martin said
Tuesday afternoon.

Crews hoped to have the work done by Tuesday evening, he said.  ATMs
were slowly coming back on line through the day.  Mercantile Bank said
nearly all its machines were working.  On Monday, 26 of the bank's
ATMs were down.  But 55 ATMs operated by 18 credit unions were still
down early Tuesday evening.

Relief couldn't come too soon for Brian Zander, general manager of
Fairmount Park.  The park lost $500,000 in weekend wagers when its
data phone lines went dead, stalling the computerized bet-taking
system.  "It is a major, major problem," Zander said.  About 1,000
would-be race goers were turned away at the track gates Saturday when
the glitch slowed betting to a crawl.  The track decided to send half
its customers home, rather than create huge lines at the betting
windows.  The turned-away patrons weren't in a kindly state of mind,
Zander said. "People don't understand it's the phone lines.  They
blame it on us."  Zander said AT&T officials had told him the problem
was at Southwestern Bell, which rents data phone lines to the
long-distance company.  AT&T and Southwestern Bell spokesman said the
couldn't immediately confirm that Fairmount's problem stemmed from the
computer crash.

 "It's entirely possible," the AT&T spokesman said.  The problem
struck Friday morning, when a Southwestern Bell computer in downtown
St. Louis suddenly broke down, cutting off 2,800 data lines.  Martin
said Southwestern Bell still doesn't know why the system failed.
Workers are trying so hard to fix it that they haven't had time to
look for the cause, he said.  Besides frustrating bettors and ATM
users, the failure also cut computer links between main computers and
some branches at Mercantile, many credit unions and United Postal
Savings Association.  The branches remained open, but business was
slowed.

At St. Louis Teacher's Credit Union, for instance, workers have to
bring records in from the branches each day and punch them into the
main computer by hand.  The lines are used by a variety of businesses
to link their computers to others around the nation.  Fairmount Park
wasn't using the computer betting system Tuesday, and Zander said he
didn't know if it was working yet.  The computer system requiring
phone lines is used only for bets on races run at other tracks.
 Fairmount has its own computer system for its races.  The computer
failure forced Fairmount to hook its computers to regular phone lines.
But data lines move information 20 times faster, he said.

                          ------------

{St. Louis Post-Dispatch}, 6/4/91, pp. 1C, 3C: 

COMPUTER FAILURE SHUTS DOWN AUTOMATIC TELLERS ACROSS AREA 

by Jim Gallagher Of the Post-Dispatch Staff

A king-sized computer snafu that silenced 2,800 data phonelines has
beenfixed "for the most part," a Southwestern Bell spokesman said
Wednesday.  All but eight of the 55 credit union automatic teller
machines were working Wednesday, after being shut down since Friday.
The computerized betting system at Fairmount Park race track is
working again.  But United Postal Savings Association says its data
lines are still down -- and its executives are getting angry.  "It's
been a nightmare," said Michael Gorman, executive vice president at
St.  Louis' second-largest savings and loan.  The high-speed lines
connect the S&L's main computer to terminals at half of United
Postal's branches.

Gorman said Southwestern Bell can't tell him when the problem will
befixed.  "They give us stories and every couple of hours it's
different. I can't believe them any more," he said.  Dave Martin,
spokesman for Southwestern Bell, said only a few lines were still out
Wednesday evening.  "When you do a major repair job like this, there
are going to be glitches,"he said.  "We are taxing the patience of the
few remaining customers with problems."

At its height, the five-day breakdown affected scores of businesses
andfroze more than 80 automatic tellers at 18 credit unions,
Mercantile Bank and United Postal.  A Fairmount Park spokeswoman said
the track lost "tens of thousands of dollars" in profit when betting
slowed to a crawl Friday and Saturday. About 1,000 race fans had to be
turned away at the gate.  Bankers complained of lost automatic teller
fees and large costs for worker overtime caused when people had to do
the work of computers.  But the companies stand little chance of
recovering their losses from Southwestern Bell.

Under the law, victims normally can't collect damages for a failure in
phone or electric services, said Rob Hack, assistant general counsel
for the Missouri Public Service Commission.  To collect, they would
have to prove that the utility deliberately cut service or was
willfully negligent.  The law was designed to protect utilities from
massive losses after breakdowns and power failures, Hack said.
Perhaps the most the victims can hope for is not to be charged for
phone services for the period the lines weredown.

The breakdown hit Friday morning in a computer switching machine in
downtown St. Louis, and Southwestern Bell engineers have been
struggling since to get it fixed.  They were joined by engineers from
American Telephone & Telegraph Corp.,which sold the equipment that
failed.  Officials at United Postal were struggling, too.  It took
them until Wednesday to switch their computers to normal phone lines.

Branches remained open and customers weren't affected, Gorman said.
But many employees were placed on overtime to work around the glitch.
United Postal hasn't yet counted the cost, he said.  At Fairmount Park,
officials said their losses go beyond lost betting profits.  Many of
their customers left the track angry, said spokeswoman Mary Ozanic.
"The hardest thing to measure is what the ill will will cost us," she
said. 
                     --------------
  
Three-quarter page advertisement in the 6/9/91 {St. Louis Post-Dispatch}, 
p.  13D:

AN OPEN LETTER TO OUR BUSINESS CUSTOMERS:

At Southwestern Bell Telephone, we've built a high standard of
customer service and we take pride in that. Unfortunately, we recently
experienced a rare failure in a computer system that transmits data.

As a result, about 750 St. Louis-area business customers lost access
to important day-to-day services. For those of you whose service was
impaired, that failure translates to a disruption in your operations
and, at best, an inconvenience to your customers. We apologize for
letting you down in this instance. Though the problem lingered longer
than any of us would have liked, we made every effort to see that it
was fixed as quickly as possible.  Our technicians worked around the
clock, logging more than 2,500 hours, to correct the problem.  We
enlisted the help of experts from across the country.

Still, I know that even though we pulled out all stops to restore
service, you would rather it not have happened at all.  So would we.
Now that service has been restored, our focus has shifted to further
upgrading the system's reliability.  While some of the solutions may
take time to complete, we will persist until the service we provide
meets your high standards and ours. In the next few days, we will
individually contact customers whose service was interrupted.  We want
to share with you our plans for improving the system,and we want to
hear your comments on how we can continually improve our service to
you. We are committed to earning your confidence once again.

Sincerely, 

(s) Randy Barroy President-Missouri
Division Southwestern Bell Telephone 
 
                             --------------


{St. Louis Post-Dispatch}, 6/4/91, pp. 1E, 8E: 

VITAL LINES Phone Net Vulnerable, Crash Shows 

by Jim Gallagher Of the Post-Dispatch Staff

Brian Zander, manager of Fairmount Park, faced a lot of unhappy
customers in the past week.  But one fuming customer took the cake.
"He was pretty red-faced," Zander recalled.  The man said he had
waited a year for his favorite horse to race at Arlington Park
racetrack near Chicago.  But when he came to Fairmount to place an
intertrack bet, he was turned away at the gate.

The high-speed data phone lines linking Arlington to Fairmount had
gone down -- part of a Southwestern Bell Telephone Co. computer
failure that rattled companies around the region last week.  Fairmount
turned away 1,000 racing fans last weekend when phone problems slowed
its computerized betting operation to a crawl.  The man was livid
because his horse won, Zander said.  "He said he was going to bet
$500.  He was not a happy camper," Zander said.

The six-day failure, which ended Thursday, affected 2,800 data lines.
At its height, it shut down more than 80 automatic teller machines,
cut off bank branches from their main computers and slowed the
transfer of billions of dollars through the Federal Reserve System.
It also illustrated the region's growing reliance on phone lines to
keep business running.  The installation of high technology over the
past decade has left the nation's phone system with an odd
contradiction.  Experts say the new technology is making the system
more efficient andusually more reliable.  But by squeezing more phone
lines into fewer computers and cables, the system may have also become
more vulnerable.  A routine foul-up or accidents that would have
silenced a few thousand lines a decade ago now can affect millions.


Take these examples:   

Jan. 4., 1991.  An AT&T construction crew in New Jersey mistakenly
cracks asingle fiber-optic cable.  The cable is the width of a
person's thumb.  The crack halts 60 percent of AT&T's long-distance
calls to New York City. Aviation control centers are paralyzed,
causing gridlock in the skies throughout the Northeast.

Jan. 23, 1990.  More than 42,000 people in north St. Louis county lose
some or all of their phone service when a fire under a bridge burns a
mainfiber-optic cable and other phone lines.  Hospitals, police and
fire departments can't get calls.  Some customers wait three days for
service to be restored.

Jan. 15, 1990.  AT&T's national switching system suffers a collapse,
and half of its long-distance calls go uncompleted.  The company takes
nine hours to restore full service.  AT&T blames a bug in its software.

May 8, 1988.  Mother's Day is soured for thousands as a fire breaks
out in an unmanned switching center in Hinsdale, Ill., near Chicago.
The blaze knocks out service for 35,000 customers for up to a month.

Communications experts say spectacular advances in computer systems
andfiber optics are improving service in general.  In Missouri, for
instance, telephone trouble reports fell 18 percent during the past
six years, according to Southwestern Bell.  But some wonder whether
technology has left the system too vulnerable toproblems at critical
choke points, where simple accidents can mushroom intocommunications
disasters.  In the 1970s, the biggest telephone cable could carry
50,000 calls.  Now fiber-optic cables carry 300,000 to 500,000.

That means that a backhoe operator these days can knock out 10 times
as many telephones in a flick of a lever.  "Technology has
concentrated things to the point where, when you do have a major
failure, you can have a much more widespread and devastating failure,"
said Don Mitchell, division manager of planning and engineering at
Southwestern Bell.  Phone calls are funnelled through computerized
switching centers.  The number of such centers has dropped by about
half since the mid-1980s, as technology allowed more phone connections
to be squeezed into fewer computer boxes.

This means a crash at a single switching center affects many more
people, experts say.  And with fewer centers, phone companies have a
harder time routing calls around a failed switch.  The Hinsdale
switching center wasn't even manned.  It was monitored by phone lines
from Springfield, Ill.  As a result, the fire burned for 90 minutes
before the Hinsdale Fire Department heard of it.  "It was designed
never to fail, and as a result, they couldn't put the fire out," said
David  Farrell, spokesman for the Illinois Commerce Commission.

The complexity of today's computers can compound the headaches.  Take
the most recent failure in St. Louis.  The problem centered on complex
software in a "digital access and cross-connect system" computer.

The $330,000 machine links thousands of customers' computers by
high-speed telephone lines.  When the software failed May 31, Bell
technicians replaced it with a standby copy on a floppy disk.  But the
spare software was "contaminated" with errors, Mitchell said.  It
destroyed the computer's main memory, forcing technicians to spend
five days painstakingly reprogramming the machine.

"It was just a freaky thing," Mitchell said.  Bell has two such
computers in St. Louis.  They are four years old and neither had
failed before, Mitchell said, although a similar machine in Kansas
City shut down in 1988.

The phone company plans to install a new computer with safety features
to prevent a repeat of the snafu.  "We don't anticipate this happening
again," he said.  Despite the spectacular collapses of recent years,
the American phone system remains remarkably reliable, experts say.
Modern telephone computers are designed to work with only two minutes
of downtime a year.  That compares with hours for other mainframe
computers.  "There is absolutely no comparison," said Jonathan Turner,
a computer science professor at Washington University.

Designers pile backup systems upon backup systems to keep the machines
clicking.  Telephone engineers compare the phone system to the highway
system.  In both, there are points of vulnerability where a lot of
routes intersect, Mitchell said.  "If the Poplar Street Bridge
collapsed, you'd have a real mess," he said.  To bypass the choke
points, Southwestern Bell is building interconnected loops of
fiber-optic cable, so that one cable break can't silence whole towns.

Last year's North County outage couldn't happen today, Mitchell said.
At the time, the area was served by a single line of main cables.  Now
it'spart of a broader loop.  If part of the loop breaks, calls simply
travel the other direction to reach their destination.  The loops are
steadily expanding, although some outlying areas -- Eureka, for
instance -- are still served by single cables.

AT&T, which carries nearly 70 percent of the nation's long distance
calls,has four cables linking St. Louis with the world.  If one
connection is cut,long-distance calls could be sent through the
others, AT&T says.

That's what was supposed to happen in New York this January, but
didn't.  Dale McHenry, AT&T divisional manager, said a system designed
to prevent such a spectacular failure wasn't fully operational in
January, although it will be by the end of the year.  "We will resond
better and better to this type of incident," he said.

AT&T, meanwhile, says it is busy adding to its own system of loops and
backup switches designed to head off major failures.  But efforts to
make the system less vulnerable run smack into twoobstacles -- price
and competition.  "We could design a fail-safe telephone system.  But
who could afford it?" said Sam Goldman, head of utility operations for
the Missouri Public Service Commission.  Many customers, meanwhile,
buy long-distance and computer-line servicethe same way they buy nuts
and bolts.  The company with the lowest price gets the business.

Washington University's Turner suspects that such price-consciousness
may put phone companies in a bind.  To make the system less vulnerable
means increasing the price -- and possibly losing the business.  "You
try to protect things.  You get backups to backups to backups.  But at
some point it's no longer cost-effective," Mitchell added.

                         -------------


[Moderator's Note: My sincere thanks to Brad Hicks for typing all this
in and submitting it o the Digest. Which direction would you go?
Backups at any cost ... or if not, up to what point?  Are occassional
outages like St. Louis this past week or Chicago in 1988 worth the
difference in cost?   PAT]