treese@crl.dec.com (02/19/91)
Archive-name: mail/archive-server/bart/1991-02-18 Original-posting-by: treese@crl.dec.com Original-subject: Re: Bye Bye BART (on comp.archives) Reposted-by: emv@ox.com (Edward Vielmetti) Could you post this on comp.archives as a followup to the message with the subject "[atari-st...] Bye Bye BART", with message ID <1991Feb11.234221.6017@ox.com>? We would like to clear up the explanation of the situation. Thanks for your cooperation. Win Treese Cambridge Research Lab treese@crl.dec.com Digital Equipment Corp. [This is kind of long but illustrates some of the things that mail-based archive servers need to deal with. In short, not all mail systems use the Date: header, and various systems have different indications of lost mail to the user. Put enough problems in the same spot, a few days of gateway downtime and presto: disaster. Mail-based archive servers entail some amount of risk to the service provider, the service user, and any number of unintended relays and gateways along the way. If you run one be prepared to hear from your neighbors. --Ed] ------- Forwarded Message Newsgroups: comp.sys.atari.st,comp.sys.atari.8bit From: reid@decwrl.dec.com (Brian Reid) Subject: Re: Bye Bye BART Summary: Not Kaiser's fault I am the manager of the USENET and electronic mail gateway between Digital Equipment Corporation and the rest of USENET. The unfortunate incident for which Mr. Kaiser has been so cruelly blamed was completely an accident, and is the result of a "culture clash" rather than any malice. It is perhaps best not to use harsh words until you have finished understanding an incident. Hans Kaiser works in Digital's software support office in Stuttgart, Germany. Like most Digital field offices, it is equipped with VMS computers and connected to Digital's DECNET network. The converstion between internal DECNET and external protocols is performed by the DECWRL computer for which I am responsible. VMS and DECNET do not have the concept of queueing mail. When you send a message, either it is delivered instantly or it bounces. The idea is that you want the sender to know instantly if his message did not get through. As a result, VMS mail users have, through the years, grown accustomed to believing that if they do not get a "message sent" message, then their message did not get sent. Whenever mail is relayed from one network to another, rather than just queued, the concept of "immediate delivery" is somewhat meaningless, because you haven't really delivered the mail, but rather have just handed it off to some intermediate postman. But user expectations are still very strong: if a user sends an internetwork message, and doesn't get back a "message sent" reply, his experience leads him to believe that the message was lost. Last week we had a head crash on the primary disk on our DECWRL relay computer, and for various reasons it took almost 3 days to get the machine back up. We announced this failure on the appropriate internal Digital newsgroups (dec.mail.config), but did not send individual notification to the tens of thousands users of the gateway, as we sometimes do when we are certain that it will be down for a long time. During this interval Hans Kaiser was trying to retrieve files from the Atari archive server. He is not a reader of dec.mail.config and probably did not know that the gateway was down. He sent some retrieval requests, and got no reply. Here comes the "culture clash" that I mentioned in the first paragraph. When a VMS user sends a mail message that does not get delivered, he is conditioned to believe that it has been lost or deleted, because that is what happens in the normal case. However, these messages that Kaiser sent were neither lost, nor deleted. They were carefully queued, waiting for the DECWRL gateway to come back up again, so that they could be sent. When he got no response, Kaiser sent more requests. This is the natural thing to do in the VMS world. If it didn't work, and if you are following instructions, then try again. Maybe something will have been fixed. I don't know exactly how many times Kaiser repeated the request over the 3-day interval, but I am sure that if he had known that his messages were all being queued, instead of vanishing as he thought, that he would not have repeated them. Eventually (I think it was on Wednesday night, California time) the DECWRL gateway was brought back to life, and all of the queued messages were sent to the Atari archive server in one lump. Archive servers are in general programmed to have per-user quotas, so that if something like this happens, it won't bring the archive server to its knees trying to handle so many requests at once. Alas, here the "culture clash" strikes again. The DECNET mail protocol does not support a "time and date" mechanism. The only information that it records about a message, besides the message body, is what we Unix/IP people know as the "To" and "Cc" and "Subject" and "From" fields. In DECNET protocol it is up to the receiver of a message to timestamp it with the time that it was received. The reason for this is that since there is no queueing, the time that a message was received is guaranteed to be equal to the time that it was sent. As a result, the network mail protocol has no mechanism to record the time that a message was sent. The documentation for the DECWRL mail gateway, which we distribute to all employees who ask for it, instructs them to use the gateway by sending mail with a certain mail program that is not part of the software that Digital ships to its customers. This program, called "nmail", is helpful in smoothing the peak load on the gateway by queueing at certain times. However, since the mail-sending software knows that the mail might be queued, it records the time that the message was actually originated. This is because the "Date" field in the message will contain the time that it was delivered and not the time that it was actually sent. "nmail" does this by adding the date and time to the "From" field of the message. It really doesn't have much choice, because the DECNET mail protocol supports only a "To", "Subj", "From", and "Cc" field, and there is a fixed limit to the size of the "Subj" field. Why does this matter? It matters because the Atari archive server at the University of Michigan looks at the "From" field of an incoming message to avoid processing too many simultaneous requests from the same person. There is a "per-user" quota for each day. The problem is that when you send the mail using a mail program that encodes the date and time of the message in the "From" field, then every message looks like it came from a different user. As a result of this, when the DECWRL mail relay came back to life last Wednesday, it sent many dozens of retrieval requests to Michigan all at once, and Michigan's software failed to understand that they were all from the same person because the "From" field on each of them had a different date and time. As a result, the Michigan archive server tried to process all of them at once, and, evidently, melted into a pile of slag. Since I work for a company that sells computers, I suppose the loyal thing for me to do at this point is to try to sell Michigan a bigger computer to use as the archive server, but I don't work in a sales office, I work in Corporate Research, and what I want is for everybody to be happy. I am very sorry that a combination of accidents inside Digital, in Germany and California, caused this unfortunate incident on a university computer at Michigan, and I will happily offer the services of the excellent network programmers at DEC Western Research to help ensure that the Michigan archive server does not meet this fate again. Mostly I want people to know that this was in no way the fault of Hans Kaiser. If it was anybody's fault, it was my fault, for accidentally failing to copy the serial number of a certain disk drive onto a service-contract renewal form for 1991, thereby leaving the disk unprotected by maintenance contract. Disks often fail on purpose when they learn that they are not covered by maintenance contract. If you have sent Mr. Kaiser (or Herr Kaiser, as he probably prefers to be called) a nasty message, it might be civil to send him another one letting him know that, now that the facts are known, you aren't so angry any more. If you find the need to be angry at somebody, please be angry at me. As the manager of an electronic mail gateway, I'm used to it. Brian Reid DEC Western Research Laboratory ------- End of Forwarded Message