David Alpern <ALPERN%SJRLVM4.BITNET@WISCVM.ARPA> (02/09/85)
I'm bothered by the choice made in the message separator (EB) as described in this document. Most digests currently do hold quite well to a standard of using either 70 or 30 hyphens alone on a line, with a blank line on either side. This is specific enough to avoid confusing user text with separators. Defining a hyphen in the first column of a line, with no other restrictions, as a separator seems like asking for ambiguity. Most front end programs will not be changed to "stuff" mail being sent for the first time, nor will most mail reader software "unstuff". As a proposal, may I suggest a line which starts with 5 hyphens in sequence followed by a space and an asterisk, ends with the reverse, and has a total length of 68. This is much harder for a user to hit randomly, would be easy for the user to avoid, and yet still allows for some text field within the separator. I think it would be worthwhile to define the standard in such a way that mail reading programs could tell, almost without ambiguity, if a message contained embedded messages. Then, depending on the user interface and the user's desires, the system could either break such messages automatically or on user command. Using the single hyphen separator it will be too likely for a single message to get broken at a list bullet or just above a "signature" (notice my own, which I've used this way for ages - if "unstuffed", this hyphen will disappear -and if I forget the space, the file gets split). - Dave David Alpern IBM San Jose Research Laboratory, K65/282 5600 Cottle Road, San Jose, CA 95193 Phone: (408) 284-6521 Bitnet: ALPERN@SJRLVM4 CSnet: ALPERN@IBM-SJ
Marshall Rose <mrose%udel-eecis2.delaware@UDEL-LOUIE.ARPA> (02/09/85)
Unfortunately, requiring more dashes just makes the problem harder and introduces additional ambiguity. Let me explain. A primary motivation for writing that RFC was that I was getting tired of constantly modifying the bursting agent in MH. First, I tried looking for 5 dashes at the beginning of a line. Sure enough, some digests let that through. Then I upped it to 30. Sure enough, some digests let that through. Then I got clever and introduced special heuristics based on the number of blank lines both preceeding and following the line that started with the dashes. Still not good enough. I then decided to look at least TWO LINES ahead and see if I could find what looked like a header. Still no good, one day someone just happened to have a couple of blank lines, a line of dashes, a couple of blank lines, all followed by a line that looked like a header. Brute force and extremely clever coding just can't compete. So, I decided that we need to bit (well, byte) stuff. This is the ONLY way to unambiguously separate messages in a digest (or a forwarding of messages, in general). So, if you're going to change all that software in net to be considerate and generate digests etc., that other software can burst, then you might as well make the change as simple as possible. Hence, the choice of a single dash for the EB. The RFC contains very simple algorithms to byte-stuff on encapsulation and to burst on decapsulation. The latter algorithm, in fact, needs only one character look-ahead. The point of all this is that I want to get painless bursting with the absolute LEAST amount of effort on the part of the mail hackers in the net. I really don't think it's too difficult to look at the beginning of each line and output a "- " if it starts with a "-". /mtr
Barry Margolin <Margolin@MIT-MULTICS.ARPA> (02/10/85)
I have to agree with David Alpern. A common two-character sequence should not be pre-empted this way. A longer sequence is less likely to be used in messages. Actually, the problem is not in the sequence chosen. The problem is that it is not possible for the Bursting Agent to tell whether the sending software includes an Forwarding Agent. If not, then the sender won't have translated lines beginning with a hyphen, but your reader will decide that the message contains encapsulated messages and burst it. The right solution is to require Encapsulation Agents to add a header field, such as Encapsulation-Boundary: <string> If this header field is not present, then the message does not contain an RFC934-conformant encapsulation, and the Bursting Agent should just pass it through unchanged. If it does, then a line beginning with <string> is an encapsulation boundary. As in the original RFC, if the <string> is followed immediately by a space then it is an escape prefix and the Bursting Agent should merely remove it from the text. I don't really think it is important that the EB be variable, as in my example. The only reason I included that capability was because I was inventing a header field and I couldn't think of anything else to put there. Another possibility is: Encapsulation-Mode: RFC-934 which allows for future enhancement of this protocol. barmar
"Frank J. Wancho" <WANCHO@SIMTEL20.ARPA> (02/10/85)
Marshall, Notwithstanding the relative merits of certain portions of this RFC, I was rather surprised that you cited the lack of a standard format, particularly the Encapsulation Boundary, as the primary motivation to write this RFC in the first place. There has been a de-facto standard in use since the first digest appeared several years ago, and a complementary UnDigestify command by Gail Zacharias (GZ@MC) for those of us who use BABYL. A couple of years ago, Mike Muuss (mike @BRL) developed an UnDigestify for Unix/MMDF msg. (There is even an option to permit automatic detection and UnDigestification along with an UnDo, just in case it guessed wrong.) Whenever a new digest appears, the moderator soon finds out whether or not the digest message was formatted correctly. Thus, although the original digests came first, it has been those of us who have an UnDigestify command who tend to "enforce" conformance to this "standard" format. I suspect that had you asked, you would have been able to develop a similar UnDigestify command for your mail handler instead of producing yet another proposed standard that seems to ignore the existing one. All of the above is not meant to say that the entire RFC is not without merit. There is something to be said in favor of the proposed method of handling Bcc:s, Forwarded, and ReMailed messages, and the implied extension of the UnDigestify command to handle the latter two cases. However, please don't overlook the fact that an extension to an existing command has a better chance of being acceptable to the community than a rewrite would... --Frank
Doug Kingston <dpk@BRL-TGR.ARPA> (02/10/85)
I agree with many of the comments made so far on RFC924. First, their is a defacto standard for which there exists working software. Second, a single dash is a poor choice of separater since it is highly likely that a) it will be used in message text by the user in such a way as to be mistaken for a separator, and b) unbursting will not be universially installed. People who don't unburst will use "-" poorly and the bursters will burst badly. I like the stuff on handling bursted contents. -Doug-
Marshall Rose <mrose%udel-eecis2.delaware@UDEL-LOUIE.ARPA> (02/11/85)
Frank, I am aware of the software you mentioned (in particular Mike's code), and it is true that everyone who posts digests seems to be using the same rules for encapsulating messages into digests. In this case perhaps the use of the term "pseudo-standard" by the RFC is unfortunate. Stef and I did not (at least intentionally) ignore what others had done and decide to introduce "Yet Another Standard" that ignores what's working now. We went to great lengths to try and remain compatible with existing Internet software to minimize compliance problems. However, there is a larger issue which I am apparently missing in the your and Doug's and Barry's comments: It really is unimportant as to the precise string that is used for an EB. The important thing is that when that string appears in a message being encapsulated into a digest (or forwarding in general), that the encapsulation agent use some escape mechanism to prevent the bursting agent from declaring an end-of-message prematurely. Now, if the BABYL software (and any other software used to do encapsulation) does that correctly, then the primary motivation for the RFC is a non-problem. Hurrah. However, I have noticed in the past that every now and then a message containing a blank line, a line of dashes, another blank line, and a line with something that remotely looks like a header slips through digests like HUMAN-NETS and TELECOM. (This isn't an attack against either group, I just happen to recall a couple of instances in each list about six to eight months ago where this happened). So, there might exist "working software", but if EBs aren't byte-stuffed then you can't unambiguously de-capsulated messages and "it doesn't work right". To repeat (though re-phrase) what the RFCs says and what my last message said: All I really want is encapsulation agents to byte-stuff their EBs. If you want to use 30 or 50 dashes on a line as an EB fine. That isn't prohibited by the RFC. If everyone does the byte-stuffing, then the choice of an EB becomes moot. As I clear my throat to change the subject, I'm interested if there are comments on the unification of forwardings, distributions, and blind-carbon-copies. The latest (and hopefully last) release of MH, MH.5 uses encapsulation in the generation of Forwardings and BCC:s. Although it sounds esoteric, being able to recursively forward forwarded messages is rather neat. /mtr
Dan Hoey <hoey@NRL-AIC.ARPA> (02/12/85)
Marshall, I am surprised that you say ... we need to bit (well, byte) stuff. This is the ONLY way to unambiguously separate messages in a digest (or a forwarding of messages, in general). I have previously indicated to you that there is a way to entirely avoid modification of the text of messages. Header People, The method I proposed was for the Forwarding Agent to choose an encapsulation boundary that does not occur in the messages to be encapsulated. Barry Margolin's proposal of indicating the chosen EB in the header is an appropriate way of communicating the choice to the Bursting Agent. However, the use of this boundary followed by a space for stuffing is not necessary. An algorithm for choosing the EB: 1. Form a list of trial EB's. I suggest the trial EB's be - twenty hyphens, - twenty hyphens followed by the current date and time, and - twenty hyphens followed by twenty randomly-chosen alphanumeric characters. 2. Scan the messages to be encapsulated for occurrences of the trial EB's. 3. If all of the trial EB's were found, fail. (If messages are being encapsulated at a rate of a megabyte per second, this step should be taken fewer than once every quintillion centuries, barring hardware failure.) 4. Return the first of the trial EB's not found in any message. Dan
David Alpern <ALPERN%SJRLVM4.BITNET@WISCVM.ARPA> (02/12/85)
Marshall, I understand your point, in that no digests that I know of prevent a message sender from including a line of 30 hyphens that is not meant as a separator. On the other hand, I don't forsee us ever getting all mail sending and reading programs on the net to "stuff". To do this properly, it would have to be not only digestifiers/ undigestifiers, but all sending/receiving programs since nothing other than the presence of a separator indicates a forwarded message. I disagree that one could use the 30 hyphen field simply as one special case of the single hyphen definition. The problem here is that not all software will be changed at once - you will be lucky if most ever is. How will the first "unstuffer" tell if it has a separator or a text hyphen - i.e. if the sender "stuffed"? Maybe the right thing is to ask all digest creators and mail forwarders to add a hyphen to any line containing exactly 30 hyphens alone in an entering message. This will help avoid the ambiguity, and will be useful even if accomplished in a very gradual manner. I doubt many will care if a line they send as 30 hyphens gets seen as 31, although there always will be somebody. This does lead to a question however -- how do we want to handle a forwarded message being sent to a digest, or otherwise reforwarded? I, for one, would first want to see the complete message (both forwarded and surrounding) as a single entity within a burst digest, and yet would like the ability to burst it again when I desired. Any bright ideas? Maybe using Barry Margolin's header-line specification of separator, combined with some "lenghten the separator for each embedded message level" algorithm will permit this. Regarding the other aspects of the RFC, i.e. treating forwarded messages in the same manner as digests -- I LIKE! I agree that currently forwarded messages are not too useful until one does separate the original text from the surrounding message. There are, however, cases where one would like to reply to both the original sender and the forwarder at once, which is not easy with your scheme as is. As an informal suggestion, I'll comment that both my undigestifier and the similar code I use for forwarded messages (less useful because of less of a pseudo-standard) add a LIST: field to the header of the embedded messages to specify either the discussion group or the forwarding sender, and my reply code notices this. - Dave David Alpern IBM San Jose Research Laboratory, K65/282 5600 Cottle Road, San Jose, CA 95193 Phone: (408) 284-6521 Bitnet: ALPERN@SJRLVM4 CSnet: ALPERN@IBM-SJ
Tommy_Ericson__QZ%QZCOM.MAILNET@MIT-MULTICS.ARPA (02/21/85)
David, I think that trying to enforce rules on Digestors like "Insert a line of 30 hyphens as message separator" would never work, for example, my keyboard cannot count. The right way of handling this would be to elaborate the Multi-Media and Script concepts, work that is currently going on (e.g. in IFIP). The X.400 way may also work. It may look cumbersome but it should be remembered that the user interface is completely left out from the recommendation, leaving it up to all and everyone to tackle the User-Presentation problem as he wishes. - Tommy