[net.mail.headers] RFC 934 - Message Encapsulation

David Alpern <ALPERN%SJRLVM4.BITNET@WISCVM.ARPA> (02/09/85)

I'm bothered by the choice made in the message separator (EB) as
described in this document.  Most digests currently do hold quite well
to a standard of using either 70 or 30 hyphens alone on a line,
with a blank line on either side.  This is specific enough to avoid
confusing user text with separators.  Defining a hyphen in the first
column of a line, with no other restrictions, as a separator seems
like asking for ambiguity.  Most front end programs will not be changed
to "stuff" mail being sent for the first time, nor will most mail
reader software "unstuff".

As a proposal, may I suggest a line which starts with 5 hyphens in
sequence followed by a space and an asterisk, ends with the reverse,
and has a total length of 68.  This is much harder for a user to hit
randomly, would be easy for the user to avoid, and yet still allows
for some text field within the separator.

I think it would be worthwhile to define the standard in such a way
that mail reading programs could tell, almost without ambiguity, if
a message contained embedded messages.  Then, depending on the user
interface and the user's desires, the system could either break such
messages automatically or on user command.  Using the single hyphen
separator it will be too likely for a single message to get broken
at a list bullet or just above a "signature" (notice my own, which
I've used this way for ages - if "unstuffed", this hyphen will disappear
-and if I forget the space, the file gets split).

- Dave

     David Alpern
     IBM San Jose Research Laboratory, K65/282
     5600 Cottle Road, San Jose, CA 95193
     Phone: (408) 284-6521
     Bitnet: ALPERN@SJRLVM4
     CSnet:  ALPERN@IBM-SJ

Marshall Rose <mrose%udel-eecis2.delaware@UDEL-LOUIE.ARPA> (02/09/85)

Unfortunately, requiring more dashes just makes the problem
harder and introduces additional ambiguity.  Let me explain.
A primary motivation for writing that RFC was that I was
getting tired of constantly modifying the bursting agent in MH.

First, I tried looking for 5 dashes at the beginning of a line.
Sure enough, some digests let that through.  Then I upped it to 30.
Sure enough, some digests let that through.  Then I got clever and
introduced special heuristics based on the number of blank lines
both preceeding and following the line that started with the dashes.
Still not good enough.  I then decided to look at least TWO LINES ahead
and see if I could find what looked like a header.  Still no good,
one day someone just happened to have a couple of blank lines, a line
of dashes, a couple of blank lines, all followed by a line that looked like
a header.  Brute force and extremely clever coding just can't compete.

So, I decided that we need to bit (well, byte) stuff.  This is the ONLY
way to unambiguously separate messages in a digest (or a forwarding of
messages, in general).  So, if you're going to change all that software in
net to be considerate and generate digests etc., that other software can
burst, then you might as well make the change as simple as possible.
Hence, the choice of a single dash for the EB.
The RFC contains very simple algorithms to byte-stuff on encapsulation
and to burst on decapsulation.  The latter algorithm, in fact, needs
only one character look-ahead.

The point of all this is that I want to get painless bursting with the
absolute LEAST amount of effort on the part of the mail hackers in the
net.  I really don't think it's too difficult to look at the beginning of
each line and output a "- " if it starts with a "-".

/mtr

Barry Margolin <Margolin@MIT-MULTICS.ARPA> (02/10/85)

I have to agree with David Alpern.  A common two-character sequence
should not be pre-empted this way.  A longer sequence is less likely to
be used in messages.

Actually, the problem is not in the sequence chosen.  The problem is
that it is not possible for the Bursting Agent to tell whether the
sending software includes an Forwarding Agent.  If not, then the sender
won't have translated lines beginning with a hyphen, but your reader
will decide that the message contains encapsulated messages and burst
it.  The right solution is to require Encapsulation Agents to add a
header field, such as

Encapsulation-Boundary:  <string>

If this header field is not present, then the message does not contain
an RFC934-conformant encapsulation, and the Bursting Agent should just
pass it through unchanged.  If it does, then a line beginning with
<string> is an encapsulation boundary.  As in the original RFC, if the
<string> is followed immediately by a space then it is an escape prefix
and the Bursting Agent should merely remove it from the text.

I don't really think it is important that the EB be variable, as in my
example.  The only reason I included that capability was because I was
inventing a header field and I couldn't think of anything else to put
there.  Another possibility is:

Encapsulation-Mode:  RFC-934

which allows for future enhancement of this protocol.
                                        barmar

"Frank J. Wancho" <WANCHO@SIMTEL20.ARPA> (02/10/85)

Marshall,

Notwithstanding the relative merits of certain portions of this RFC, I
was rather surprised that you cited the lack of a standard format,
particularly the Encapsulation Boundary, as the primary motivation to
write this RFC in the first place.  There has been a de-facto standard
in use since the first digest appeared several years ago, and a
complementary UnDigestify command by Gail Zacharias (GZ@MC) for those
of us who use BABYL.  A couple of years ago, Mike Muuss (mike @BRL)
developed an UnDigestify for Unix/MMDF msg.  (There is even an option
to permit automatic detection and UnDigestification along with an
UnDo, just in case it guessed wrong.)  Whenever a new digest appears,
the moderator soon finds out whether or not the digest message was
formatted correctly.  Thus, although the original digests came first,
it has been those of us who have an UnDigestify command who tend to
"enforce" conformance to this "standard" format.

I suspect that had you asked, you would have been able to develop a
similar UnDigestify command for your mail handler instead of producing
yet another proposed standard that seems to ignore the existing one.

All of the above is not meant to say that the entire RFC is not
without merit.  There is something to be said in favor of the proposed
method of handling Bcc:s, Forwarded, and ReMailed messages, and the
implied extension of the UnDigestify command to handle the latter two
cases.  However, please don't overlook the fact that an extension to
an existing command has a better chance of being acceptable to the
community than a rewrite would...

--Frank

Doug Kingston <dpk@BRL-TGR.ARPA> (02/10/85)

I agree with many of the comments made so far on RFC924.  First, their
is a defacto standard for which there exists working software.  Second,
a single dash is a poor choice of separater since it is highly likely
that a) it will be used in message text by the user in such a way as to
be mistaken for a separator, and b) unbursting will not be universially
installed.  People who don't unburst will use "-" poorly and the
bursters will burst badly.

I like the stuff on handling bursted contents.

					-Doug-

Marshall Rose <mrose%udel-eecis2.delaware@UDEL-LOUIE.ARPA> (02/11/85)

Frank,

    I am aware of the software you mentioned (in particular Mike's
    code), and it is true that everyone who posts digests seems to be
    using the same rules for encapsulating messages into digests. In
    this case perhaps the use of the term "pseudo-standard" by the RFC
    is unfortunate.  Stef and I did not (at least intentionally) ignore
    what others had done and decide to introduce "Yet Another Standard"
    that ignores what's working now.  We went to great lengths to try
    and remain compatible with existing Internet software to minimize
    compliance problems.

    However, there is a larger issue which I am apparently missing in
    the your and Doug's and Barry's comments:

	 It really is unimportant as to the precise string that is used
	 for an EB.  The important thing is that when that string
	 appears in a message being encapsulated into a digest (or
	 forwarding in general), that the encapsulation agent use some
	 escape mechanism to prevent the bursting agent from declaring
	 an end-of-message prematurely.

    Now, if the BABYL software (and any other software used to do
    encapsulation) does that correctly, then the primary motivation for
    the RFC is a non-problem.  Hurrah.  However, I have noticed in the
    past that every now and then a message containing a blank line, a
    line of dashes, another blank line, and a line with something that
    remotely looks like a header slips through digests like HUMAN-NETS
    and TELECOM.  (This isn't an attack against either group, I just
    happen to recall a couple of instances in each list about six to
    eight months ago where this happened). So, there might exist
    "working software", but if EBs aren't byte-stuffed then you can't
    unambiguously de-capsulated messages and "it doesn't work right".

    To repeat (though re-phrase) what the RFCs says and what my last
    message said:

	 All I really want is encapsulation agents to byte-stuff their
	 EBs.  If you want to use 30 or 50 dashes on a line as an EB
	 fine.  That isn't prohibited by the RFC.

    If everyone does the byte-stuffing, then the choice of an EB
    becomes moot.

    As I clear my throat to change the subject, I'm interested if there
    are comments on the unification of forwardings, distributions, and
    blind-carbon-copies.  The latest (and hopefully last) release of
    MH, MH.5 uses encapsulation in the generation of Forwardings and
    BCC:s.  Although it sounds esoteric, being able to recursively
    forward forwarded messages is rather neat.

/mtr

Dan Hoey <hoey@NRL-AIC.ARPA> (02/12/85)

Marshall,

I am surprised that you say

    ... we need to bit (well, byte) stuff.  This is the ONLY
    way to unambiguously separate messages in a digest (or a forwarding of
    messages, in general).

I have previously indicated to you that there is a way to entirely
avoid modification of the text of messages.

Header People,

The method I proposed was for the Forwarding Agent to choose an
encapsulation boundary that does not occur in the messages to be
encapsulated.  Barry Margolin's proposal of indicating the chosen EB in
the header is an appropriate way of communicating the choice to the
Bursting Agent.  However, the use of this boundary followed by a space
for stuffing is not necessary.

An algorithm for choosing the EB:

    1.  Form a list of trial EB's.  I suggest the trial EB's be
	- twenty hyphens,
	- twenty hyphens followed by the current date and time, and
	- twenty hyphens followed by twenty randomly-chosen
		alphanumeric characters.

    2.  Scan the messages to be encapsulated for occurrences of the trial
        EB's.

    3.  If all of the trial EB's were found, fail.  (If messages are
	being encapsulated at a rate of a megabyte per second, this step
	should be taken fewer than once every quintillion centuries,
	barring hardware failure.)

    4.  Return the first of the trial EB's not found in any message.

Dan

David Alpern <ALPERN%SJRLVM4.BITNET@WISCVM.ARPA> (02/12/85)

Marshall,

I understand your point, in that no digests that I know of prevent
a message sender from including a line of 30 hyphens that is not
meant as a separator.  On the other hand, I don't forsee us ever
getting all mail sending and reading programs on the net to "stuff".
To do this properly, it would have to be not only digestifiers/
undigestifiers, but all sending/receiving programs since nothing other
than the presence of a separator indicates a forwarded message.

I disagree that one could use the 30 hyphen field simply as one special
case of the single hyphen definition.  The problem here is that not
all software will be changed at once - you will be lucky if most ever
is.  How will the first "unstuffer" tell if it has a separator or a
text hyphen - i.e. if the sender "stuffed"?

Maybe the right thing is to ask all digest creators and mail forwarders
to add a hyphen to any line containing exactly 30 hyphens alone in an
entering message.  This will help avoid the ambiguity, and will be useful
even if accomplished in a very gradual manner.  I doubt many will care
if a line they send as 30 hyphens gets seen as 31, although there always
will be somebody.

This does lead to a question however -- how do we want to handle a
forwarded message being sent to a digest, or otherwise reforwarded?
I, for one, would first want to see the complete message (both forwarded
and surrounding) as a single entity within a burst digest, and yet would
like the ability to burst it again when I desired.  Any bright ideas?
Maybe using Barry Margolin's header-line specification of separator,
combined with some "lenghten the separator for each embedded message
level" algorithm will permit this.

Regarding the other aspects of the RFC, i.e. treating forwarded messages
in the same manner as digests -- I LIKE!  I agree that currently
forwarded messages are not too useful until one does separate the
original text from the surrounding message.  There are, however, cases
where one would like to reply to both the original sender and the
forwarder at once, which is not easy with your scheme as is.  As an
informal suggestion, I'll comment that both my undigestifier and the
similar code I use for forwarded messages (less useful because of less
of a pseudo-standard) add a LIST: field to the header of the embedded
messages to specify either the discussion group or the forwarding sender,
and my reply code notices this.

- Dave

     David Alpern
     IBM San Jose Research Laboratory, K65/282
     5600 Cottle Road, San Jose, CA 95193
     Phone: (408) 284-6521
     Bitnet: ALPERN@SJRLVM4
     CSnet:  ALPERN@IBM-SJ

Tommy_Ericson__QZ%QZCOM.MAILNET@MIT-MULTICS.ARPA (02/21/85)

David,

I think that trying to enforce rules on Digestors like "Insert
a line of 30 hyphens as message separator" would never work,
for example, my keyboard cannot count.

The right way of handling this would be to elaborate the Multi-Media
and Script concepts, work that is currently going on (e.g. in IFIP).

The X.400 way may also work. It may look cumbersome but it should
be remembered that the user interface is completely left out
from the recommendation, leaving it up to all and everyone to
tackle the User-Presentation problem as he wishes.

- Tommy