fair@APPLE.COM ("Erik E. Fair", Your Friendly Postmaster) (05/10/91)
This Is The Network: The Apple Engineering Network. The Apple Engineering Network has about 100 IP subnets, 224 AppleTalk zones, and over 600 AppleTalk networks. It stretches from Tokyo, Japan, to Paris, France, with half a dozen locations in the U.S., and 40 buildings in the Silicon Valley. It is interconnected with the Internet in three places: two in the Silicon Valley, and one in Boston. It supports almost 10,000 users every day. When things go wrong with E-mail on this network, it's my problem. My name is Fair. I carry a badge. [insert theme from "Dragnet"] The story you are about to read is true. The names have not been changed so as to finger the guilty. It was early evening, on a Monday. I was working the swing shift out of Engineering Computer Operations under the command of Richard Herndon. I don't have a partner. While I was reading my E-mail that evening, I noticed that the load average on apple.com, our VAX-8650, had climbed way out of its normal range to just over 72. Upon investigation, I found that thousands of Internet hosts were trying to send us an error message. I also found 2,000+ copies of this error message already in our queue. I immediately shut down the sendmail daemon which was offering SMTP service on our VAX. I examined the error message, and reconstructed the following sequence of events: We have a large community of users who use QuickMail, a popular macintosh based E-mail system from CE Software. In order to make it possible for these users to communicate with other users who have chosen to use other E-mail systems, ECO supports a QuickMail to Internet E-mail gateway. We use RFC822 Internet mail format, and RFC821 SMTP as our common intermediate E-mail standard, and we gateway everything that we can to that standard, to promote interoperability. The gateway that we installed for this purpose is MAIL*LINK SMTP from Starnine Systems. This product is also known as GatorMail-Q from Cayman Systems. It does gateway duty for all of the 3,500 QuickMail users on the Apple Engineering Network. Many of our users subscribe, from QuickMail, to Internet mailing lists which are delivered to them through this gateway. One such user, Mark E. Davis, is on the unicode@sun.com mailing list, to discuss some alternatives to ASCII with the other members of that list. Sometime on Monday, he replied to a message that he recieved from the mailing list. He composed a one paragraph comment on the original message, and hit the "send" button. Somewhere in the process of that reply, either QuickMail or MAIL*LINK SMTP mangled the "To:" field of the message. The important part is that the "To:" field contained exactly one "<" character, without a matching ">" character. This minor point caused the massive devastation, because it interacted with a bug in sendmail. Note that this syntax error in the "To:" field has nothing whatsoever to do with the actual recipient list, which is handled separately, and which, in this case, was perfectly correct. The message made it out of the Apple Engineering Network, and over to Sun Microsystems, where it was exploded out to all the recipients of the unicode@sun.com mailing list. Sendmail, arguably the standard SMTP daemon and mailer for UNIX, doesn't like "To:" fields which are constructed as described. What it does about this is the real problem: it sends an error message back to the sender of the message, AND delivers the original message onward to whatever specified destinations are listed in the recipient list. This is deadly. The effect was that every sendmail daemon on every host which touched the bad message sent an error message back to us about it. I have often dreaded the possibility that one day, every host on the Internet (all 400,000 of them) would try to send us a message, all at once. On monday, we got a taste of what that must be like. I don't know how many people are on the unicode@sun.com mailing list, but I've heard from Postmasters in Sweden, Japan, Korea, Australia, Britain, France, and all over the U.S. I speculate that the list has at least 200 recipients, and about 25% of them are actually UUCP sites that are MX'd on the Internet. I destroyed about 4,000 copies of the error message in our queues here at Apple Computer. After I turned off our SMTP daemon, our secondary MX sites got whacked. We have a secondary MX site so that when we're down, someone else will collect our mail in one place, and deliver it to us in an orderly fashion, rather than have every host which has a message for us jump on us the very second that we come back up. Our secondary MX is the CSNET Relay (relay.cs.net and relay2.cs.net). They eventually destroyed over 11,000 copies of the error message in the queues on the two relay machines. Their postmistress was at wit's end when I spoke to her. She wanted to know what had hit her machines. It seems that for every one machine that had successfully contacted apple.com and delivered a copy of that error message, there were three hosts which couldn't get ahold of apple.com because we were overloaded from all the mail, and so they contacted the CSNET Relay instead. I also heard from CSNET that UUNET, a major MX site for many other hosts, had destroyed 2,000 copies of the error message. I presume that their modems were very busy delivering copies of the error message from outlying UUCP sites back to us at Apple Computer. This instantiation of this problem has abated for the moment, but I'm still spending a lot of time answering E-mail queries from postmasters all over the world. The next day, I replaced the current release of MAIL*LINK SMTP with a beta test version of their next release. It has not shown the header mangling bug, yet. The final chapter of this horror story has yet to be written. The versions of sendmail with this behavior are still out there on hundreds of thousands of computers, waiting for another chance to bury some unlucky site in error messages. Are you next? [insert theme from "The Twilight Zone"] just the vax, ma'am, Erik E. Fair apple!fair fair@apple.com
steve@UMIACS.UMD.EDU (Steve D. Miller) (05/10/91)
I had this same sort of error happen to me in the early days (only 500 or so people on the list, thank goodness) of the Sun-Nets mailing list. The resulting errors trashed a VAX 8600 here for twelve hours or so. In self- defense, I added a hack to the software I use to run the Sun-Nets list: it checks several important header lines to be sure that they aren't too badly botched, and if it detects an error it bounces the mail to the list maintainer with a note that says, "the header is messed up, you'd better take a look at this." From what Erik said, it sounds like my software would have kept this problem from happening. (If someone can give me a copy of the headers off the original mail, I can check this out.) The software also: - sets the Sender: line and the from address in the envelope to say something reasonable - optionally trims Received: lines (good for times when a message takes 9 hops to come in and another 9 to make it back out, and thus would otherwise trigger sendmail's fake-o loop detection) - allows an optional header and/or footer to be added to the body of the message - has some limited smarts (shamelessly borrowed from the mail2news program) that attempts to filter out ``please add/delete me'' mail mistakenly sent to the list readership rather than to the administrivia address. I'm sure it's not perfect (the administrivia filter is too trusting, and in particular doesn't catch LISTSERV stuff), but if you want it, anonymous FTP out to ftp.umiacs.umd.edu and grab pub/distribute.tar.Z. The man entry should be enough to get you started. Distribute should be fairly portable (but it will require hacking to be used with something other than sendmail). If you make changes to this software, I'd be interested in seeing them. -Steve Spoken: Steve Miller Domain: steve@umiacs.umd.edu UUCP: uunet!mimsy!steve Phone: +1-301-405-6736 USPS: UMIACS, Univ. of Maryland, College Park, MD 20742