telecom@eecs.nwu.edu (TELECOM Moderator) (06/10/91)
[Moderator's Note: Here is a special report sent to the Digest by Brad Hicks which discusses the major service outage in St. Louis last week. It was too large for a regular issue of the Digest. PAT] Date: 09 Jun 91 15:57:19 EDT From: "76012,300 Brad Hicks" <76012.300@compuserve.com> Subject: SWBT Drops 2,800 Data Lines {St. Louis Post-Dispatch} , 6/4/91, pp. 1A, 7A: COMPUTER FAILURE SHUTS DOWN AUTOMATIC TELLERS ACROSS AREA by Jim Gallagher Of the Post-Dispatch Staff Scores of automatic teller machines across St. Louis have been shut down for four days because of a computer failure at Southwestern Bell. The glitch also slowed the transfer of billions of dollars on Friday through the Federal Reserve Bank of St. Louis and afected dozens of other businesses that send information by computer. A Southwestern Bell spokesman said Monday that crews were working aroundthe clock to put 2,800 data transmission lines back in service. The failure hit Friday morning. It immediately froze 55 automatic teller machines belonging to 18 St. Louis credit unions with 135,000 customers. An additional 26 automatic tellers owned by Mercantile Bank and several owned by United Postal Savings Association also went blank. Tellers in some branches lost their links to the main computers at the credit unions, Mercantile and United Postal. The branches remained open using backup systems, although the glitch is slowing business, officials say. "It's been a major pain," said Bill Humpfer, president of St. Louis Teachers Credit Union, where automatic tellers are used 50,000 times a month. Southwestern Bell said the problem was in a "high capacity digital cross-connection system" in the 2600 block of Olive Street. The machine shuttles computer calls between stations. Phone company officials don't know what caused the problem but sabotage is not suspected, said Bell spokesman David Martin. Engineers got the computer working again on Saturday and began the slow process of reprogramming it. By Monday night, Bell had restored 1,500 lines -- mostly high-speed lines used by businesses -- but the automatic tellers remained down. The Federal Reserve Bank of St. Louis -- which wires $15 billion a day between banks around the country -- switched to a backup system Friday to keep the money moving. "We got all our work done, but it delayed things," said a Fed spokesman. The Fed was still using the backup system Monday afternoon. Besides financial institutions, the system serves scores of businesses that send information by computer. AT&T leases about 280 of the Bell lines for its own long-distance customers. Most were back in service Monday, an AT&T spokesman said. The problem hit on a very busy day for banks. It was a "double payday" -- a Friday on the last day of the month. Many customers who could not used automatic tellers lined up in front of human tellers instead. Lines were longer than normal, although officials said the extra crowd caused few problems. But credit union officials said the breakdown was costing them money. Customers who can't use their own credit union's automatic tellers may use acompetitor's instead. When that happens, the credit union pays a fee to the competitor. "We're starting to talk megabucks," Humpfer said. Credit unions affected are part of a single automatic teller machine processing system. The credit unions include Anheuser-Busch, Aerospace, Educational Emerson, Gateway Telco, Telephone, First Community, WestCommunity, RAC, First American, Sunset, St. Louis Federal Center, St. Andrew,Arsenal, Wetterau, Victory, South Community and Electro. ---------- {St. Louis Post-Dispatch}, 6/5/91, pp. 1C, 3C: CRASH! SW Bell Computer Expected Back on Line This Morning by Jim Gallagher Of the Post-Dispatch Staff The computer breakdown that paralyzed dozens of automatic teller machines and apparently frustrated crowds of horse-racing fans should be fixed by this morning, Southwestern Bell Telephone Co. says. The phone company was methodically reactivating 2,800 high-speed lines that carry electronic data around St. Louis and beyond. "We're finishing it up," phone company spokesman David Martin said Tuesday afternoon. Crews hoped to have the work done by Tuesday evening, he said. ATMs were slowly coming back on line through the day. Mercantile Bank said nearly all its machines were working. On Monday, 26 of the bank's ATMs were down. But 55 ATMs operated by 18 credit unions were still down early Tuesday evening. Relief couldn't come too soon for Brian Zander, general manager of Fairmount Park. The park lost $500,000 in weekend wagers when its data phone lines went dead, stalling the computerized bet-taking system. "It is a major, major problem," Zander said. About 1,000 would-be race goers were turned away at the track gates Saturday when the glitch slowed betting to a crawl. The track decided to send half its customers home, rather than create huge lines at the betting windows. The turned-away patrons weren't in a kindly state of mind, Zander said. "People don't understand it's the phone lines. They blame it on us." Zander said AT&T officials had told him the problem was at Southwestern Bell, which rents data phone lines to the long-distance company. AT&T and Southwestern Bell spokesman said the couldn't immediately confirm that Fairmount's problem stemmed from the computer crash. "It's entirely possible," the AT&T spokesman said. The problem struck Friday morning, when a Southwestern Bell computer in downtown St. Louis suddenly broke down, cutting off 2,800 data lines. Martin said Southwestern Bell still doesn't know why the system failed. Workers are trying so hard to fix it that they haven't had time to look for the cause, he said. Besides frustrating bettors and ATM users, the failure also cut computer links between main computers and some branches at Mercantile, many credit unions and United Postal Savings Association. The branches remained open, but business was slowed. At St. Louis Teacher's Credit Union, for instance, workers have to bring records in from the branches each day and punch them into the main computer by hand. The lines are used by a variety of businesses to link their computers to others around the nation. Fairmount Park wasn't using the computer betting system Tuesday, and Zander said he didn't know if it was working yet. The computer system requiring phone lines is used only for bets on races run at other tracks. Fairmount has its own computer system for its races. The computer failure forced Fairmount to hook its computers to regular phone lines. But data lines move information 20 times faster, he said. ------------ {St. Louis Post-Dispatch}, 6/4/91, pp. 1C, 3C: COMPUTER FAILURE SHUTS DOWN AUTOMATIC TELLERS ACROSS AREA by Jim Gallagher Of the Post-Dispatch Staff A king-sized computer snafu that silenced 2,800 data phonelines has beenfixed "for the most part," a Southwestern Bell spokesman said Wednesday. All but eight of the 55 credit union automatic teller machines were working Wednesday, after being shut down since Friday. The computerized betting system at Fairmount Park race track is working again. But United Postal Savings Association says its data lines are still down -- and its executives are getting angry. "It's been a nightmare," said Michael Gorman, executive vice president at St. Louis' second-largest savings and loan. The high-speed lines connect the S&L's main computer to terminals at half of United Postal's branches. Gorman said Southwestern Bell can't tell him when the problem will befixed. "They give us stories and every couple of hours it's different. I can't believe them any more," he said. Dave Martin, spokesman for Southwestern Bell, said only a few lines were still out Wednesday evening. "When you do a major repair job like this, there are going to be glitches,"he said. "We are taxing the patience of the few remaining customers with problems." At its height, the five-day breakdown affected scores of businesses andfroze more than 80 automatic tellers at 18 credit unions, Mercantile Bank and United Postal. A Fairmount Park spokeswoman said the track lost "tens of thousands of dollars" in profit when betting slowed to a crawl Friday and Saturday. About 1,000 race fans had to be turned away at the gate. Bankers complained of lost automatic teller fees and large costs for worker overtime caused when people had to do the work of computers. But the companies stand little chance of recovering their losses from Southwestern Bell. Under the law, victims normally can't collect damages for a failure in phone or electric services, said Rob Hack, assistant general counsel for the Missouri Public Service Commission. To collect, they would have to prove that the utility deliberately cut service or was willfully negligent. The law was designed to protect utilities from massive losses after breakdowns and power failures, Hack said. Perhaps the most the victims can hope for is not to be charged for phone services for the period the lines weredown. The breakdown hit Friday morning in a computer switching machine in downtown St. Louis, and Southwestern Bell engineers have been struggling since to get it fixed. They were joined by engineers from American Telephone & Telegraph Corp.,which sold the equipment that failed. Officials at United Postal were struggling, too. It took them until Wednesday to switch their computers to normal phone lines. Branches remained open and customers weren't affected, Gorman said. But many employees were placed on overtime to work around the glitch. United Postal hasn't yet counted the cost, he said. At Fairmount Park, officials said their losses go beyond lost betting profits. Many of their customers left the track angry, said spokeswoman Mary Ozanic. "The hardest thing to measure is what the ill will will cost us," she said. -------------- Three-quarter page advertisement in the 6/9/91 {St. Louis Post-Dispatch}, p. 13D: AN OPEN LETTER TO OUR BUSINESS CUSTOMERS: At Southwestern Bell Telephone, we've built a high standard of customer service and we take pride in that. Unfortunately, we recently experienced a rare failure in a computer system that transmits data. As a result, about 750 St. Louis-area business customers lost access to important day-to-day services. For those of you whose service was impaired, that failure translates to a disruption in your operations and, at best, an inconvenience to your customers. We apologize for letting you down in this instance. Though the problem lingered longer than any of us would have liked, we made every effort to see that it was fixed as quickly as possible. Our technicians worked around the clock, logging more than 2,500 hours, to correct the problem. We enlisted the help of experts from across the country. Still, I know that even though we pulled out all stops to restore service, you would rather it not have happened at all. So would we. Now that service has been restored, our focus has shifted to further upgrading the system's reliability. While some of the solutions may take time to complete, we will persist until the service we provide meets your high standards and ours. In the next few days, we will individually contact customers whose service was interrupted. We want to share with you our plans for improving the system,and we want to hear your comments on how we can continually improve our service to you. We are committed to earning your confidence once again. Sincerely, (s) Randy Barroy President-Missouri Division Southwestern Bell Telephone -------------- {St. Louis Post-Dispatch}, 6/4/91, pp. 1E, 8E: VITAL LINES Phone Net Vulnerable, Crash Shows by Jim Gallagher Of the Post-Dispatch Staff Brian Zander, manager of Fairmount Park, faced a lot of unhappy customers in the past week. But one fuming customer took the cake. "He was pretty red-faced," Zander recalled. The man said he had waited a year for his favorite horse to race at Arlington Park racetrack near Chicago. But when he came to Fairmount to place an intertrack bet, he was turned away at the gate. The high-speed data phone lines linking Arlington to Fairmount had gone down -- part of a Southwestern Bell Telephone Co. computer failure that rattled companies around the region last week. Fairmount turned away 1,000 racing fans last weekend when phone problems slowed its computerized betting operation to a crawl. The man was livid because his horse won, Zander said. "He said he was going to bet $500. He was not a happy camper," Zander said. The six-day failure, which ended Thursday, affected 2,800 data lines. At its height, it shut down more than 80 automatic teller machines, cut off bank branches from their main computers and slowed the transfer of billions of dollars through the Federal Reserve System. It also illustrated the region's growing reliance on phone lines to keep business running. The installation of high technology over the past decade has left the nation's phone system with an odd contradiction. Experts say the new technology is making the system more efficient andusually more reliable. But by squeezing more phone lines into fewer computers and cables, the system may have also become more vulnerable. A routine foul-up or accidents that would have silenced a few thousand lines a decade ago now can affect millions. Take these examples: Jan. 4., 1991. An AT&T construction crew in New Jersey mistakenly cracks asingle fiber-optic cable. The cable is the width of a person's thumb. The crack halts 60 percent of AT&T's long-distance calls to New York City. Aviation control centers are paralyzed, causing gridlock in the skies throughout the Northeast. Jan. 23, 1990. More than 42,000 people in north St. Louis county lose some or all of their phone service when a fire under a bridge burns a mainfiber-optic cable and other phone lines. Hospitals, police and fire departments can't get calls. Some customers wait three days for service to be restored. Jan. 15, 1990. AT&T's national switching system suffers a collapse, and half of its long-distance calls go uncompleted. The company takes nine hours to restore full service. AT&T blames a bug in its software. May 8, 1988. Mother's Day is soured for thousands as a fire breaks out in an unmanned switching center in Hinsdale, Ill., near Chicago. The blaze knocks out service for 35,000 customers for up to a month. Communications experts say spectacular advances in computer systems andfiber optics are improving service in general. In Missouri, for instance, telephone trouble reports fell 18 percent during the past six years, according to Southwestern Bell. But some wonder whether technology has left the system too vulnerable toproblems at critical choke points, where simple accidents can mushroom intocommunications disasters. In the 1970s, the biggest telephone cable could carry 50,000 calls. Now fiber-optic cables carry 300,000 to 500,000. That means that a backhoe operator these days can knock out 10 times as many telephones in a flick of a lever. "Technology has concentrated things to the point where, when you do have a major failure, you can have a much more widespread and devastating failure," said Don Mitchell, division manager of planning and engineering at Southwestern Bell. Phone calls are funnelled through computerized switching centers. The number of such centers has dropped by about half since the mid-1980s, as technology allowed more phone connections to be squeezed into fewer computer boxes. This means a crash at a single switching center affects many more people, experts say. And with fewer centers, phone companies have a harder time routing calls around a failed switch. The Hinsdale switching center wasn't even manned. It was monitored by phone lines from Springfield, Ill. As a result, the fire burned for 90 minutes before the Hinsdale Fire Department heard of it. "It was designed never to fail, and as a result, they couldn't put the fire out," said David Farrell, spokesman for the Illinois Commerce Commission. The complexity of today's computers can compound the headaches. Take the most recent failure in St. Louis. The problem centered on complex software in a "digital access and cross-connect system" computer. The $330,000 machine links thousands of customers' computers by high-speed telephone lines. When the software failed May 31, Bell technicians replaced it with a standby copy on a floppy disk. But the spare software was "contaminated" with errors, Mitchell said. It destroyed the computer's main memory, forcing technicians to spend five days painstakingly reprogramming the machine. "It was just a freaky thing," Mitchell said. Bell has two such computers in St. Louis. They are four years old and neither had failed before, Mitchell said, although a similar machine in Kansas City shut down in 1988. The phone company plans to install a new computer with safety features to prevent a repeat of the snafu. "We don't anticipate this happening again," he said. Despite the spectacular collapses of recent years, the American phone system remains remarkably reliable, experts say. Modern telephone computers are designed to work with only two minutes of downtime a year. That compares with hours for other mainframe computers. "There is absolutely no comparison," said Jonathan Turner, a computer science professor at Washington University. Designers pile backup systems upon backup systems to keep the machines clicking. Telephone engineers compare the phone system to the highway system. In both, there are points of vulnerability where a lot of routes intersect, Mitchell said. "If the Poplar Street Bridge collapsed, you'd have a real mess," he said. To bypass the choke points, Southwestern Bell is building interconnected loops of fiber-optic cable, so that one cable break can't silence whole towns. Last year's North County outage couldn't happen today, Mitchell said. At the time, the area was served by a single line of main cables. Now it'spart of a broader loop. If part of the loop breaks, calls simply travel the other direction to reach their destination. The loops are steadily expanding, although some outlying areas -- Eureka, for instance -- are still served by single cables. AT&T, which carries nearly 70 percent of the nation's long distance calls,has four cables linking St. Louis with the world. If one connection is cut,long-distance calls could be sent through the others, AT&T says. That's what was supposed to happen in New York this January, but didn't. Dale McHenry, AT&T divisional manager, said a system designed to prevent such a spectacular failure wasn't fully operational in January, although it will be by the end of the year. "We will resond better and better to this type of incident," he said. AT&T, meanwhile, says it is busy adding to its own system of loops and backup switches designed to head off major failures. But efforts to make the system less vulnerable run smack into twoobstacles -- price and competition. "We could design a fail-safe telephone system. But who could afford it?" said Sam Goldman, head of utility operations for the Missouri Public Service Commission. Many customers, meanwhile, buy long-distance and computer-line servicethe same way they buy nuts and bolts. The company with the lowest price gets the business. Washington University's Turner suspects that such price-consciousness may put phone companies in a bind. To make the system less vulnerable means increasing the price -- and possibly losing the business. "You try to protect things. You get backups to backups to backups. But at some point it's no longer cost-effective," Mitchell added. ------------- [Moderator's Note: My sincere thanks to Brad Hicks for typing all this in and submitting it o the Digest. Which direction would you go? Backups at any cost ... or if not, up to what point? Are occassional outages like St. Louis this past week or Chicago in 1988 worth the difference in cost? PAT]