[net.general] A summary of RFC733 for USENET folks

smb (06/11/82)

The following is a summary of RFC733 -- the ARPAnet standard for mail
messages -- for the benefit of the Usenet community.  Note that it *is*
a summary; the main purpose is to let people prepare messages that do
not conflict with the mailers out there in ARPAland.  There's much more
that I've omitted.

A message consists of a header, a null line, followed by a body.  The
body is arbitrary text (stick to printable characters -- some systems
have trouble with control characters).  A header consists a series of
lines, each of which consists of a field-name, a colon, and some text.
The field-name must start in column 1; it should not contain blanks,
because many mailers don't handle that properly (in particular,
Berkeley mail does't recognize such lines as headers); also, the
proposed revised standard does not permit any imbedded white-space.
The text portion may be continued by starting the next line with a tab
or blank.  Case is ignored when comparing field-names.  A number of
header types are defined by RFC733; however, *anything* that is
syntactically valid (like my "In-real-life" line) is permitted, subject
to pre-emption by later revisions to the standard.

The only mandatory lines are "From" and "Date"; these are properly
inserted by the Berkeley mailer code, and need not be included in any
messages sent to Berkeley for retransmission.  A consequence of this is
that the simplest way to send a legal RFC733-format message is to
include a null line at the beginning of your letter -- if you do that,
no one has the right to complain about violation of standards.  (If you
do insert "From" and "Date" yourself, you must be certain you know
*exactly* what you're doing, for several reasons.  Berkeley mail views
any "From:" line as taking precedence over the return address generated
by uucp along the way; thus, 'reply' commands may not work correctly.
"Date:" lines must adhere to a rigid format, though this is more
honored in the breach by many network mailers.)

A useful header line to include is "Subject"; any arbitrary text may be
included here to give a hint to the recipient about the message
content.  No semantics are attached to this line by the standard.
Other related standard fields are "Comments" and "Keywords".

Some messages include the "In-Reply-To" line; this in some way
specifies the message to which this one is a reply.  No meaning is
attached to the content of this field unless it contains a string that
looks like a machine-readable address:  a string enclosed in < and >.
If such a string does exist, it must match the contents of the
"Message-Id" field of the original message; that field must be of the
form <guaranteed-unique-string>.  (Theoretically, the string must
include "@domain" at the end, but many systems do not adhere to that, and
it is unwise to depend on it if writing any code to use it.)
"Message-Id" and "In-Reply-To" are intended for highly-automated
systems that eliminate duplicate messages, pull your original out to
compare with the reply, etc., and should not in general be created by
the user.  (There is one other field, "References", that is used in the
same way; it, too, may not contain <> strings unless they are in the
proper format.)

The other header fields contain user addresses (more on which below),
and may be specified by uucp folks, albeit with caution.  "To"
specifies the primary recipients of the message, "cc" specifies the
secondary recipients, and "bcc" specifies the recipients of "blind
carbon copies" -- their names do not show up on everyone else's copy.
Note that these fields are *not* required, even if multiple copies are
sent; however, their presence aids automated reply systems.  "Sender"
identifies the user-id that actually sent the message, and is normally
the same as the "From" field, in which case it should be omitted --
RFC733 permits some mighty odd "From" lines, which (in the old version
of the standard) need not correspond to a machine-readable address.
The mailer is responsible for generating the "Sender" line if
necessary, and it should *not* be included in uucp mail.  Normally,
replies are sent to the originator of the message; if someone else
should receive the reply, a "Reply-to" line may be included.  Some
versions of Berkeley mail don't seem to handle this line correctly, and
it is not clear that it works properly through the gateway;
consequently, I recommend that it not be used in uucp/ARPA
communications.

RFC733 permits very general address specifications, and I won't even
try to summarize all the legal forms.  First, anything in matched
parentheses is a comment, and is ignored.  Anything enclosed in double
quotes is treated as a single word.  The basic form of an address is
one of

	word at domain-ref
	word@domain-ref
	word... <word@domain-ref>
	word... <word at domain-ref>

In the latter two forms, anything outside the angle-brackets is
ignored.  A "domain-ref" is typically a host-name (i.e, MIT-AI), but in
the new standard may be of the form "host.network", i.e.,
"MIT-AI.ARPA".  Some day, we'll have addresses like "smb@unc.uucp",
rather than all that crazy "!" nonsense.  Note that legal address must
include an ARPA domain reference, and hence can't really be used as-is
by uucp folks.  Fortunately, Berkeley's software transforms "To" and
"cc" lines into this form, using their machine-id and prepending the
routing information deduced from the standard UNIX "from" lines.  Thus,
all addresses in your "To" and "cc" lines should be relative to your
machine; the message will be massaged appropriately when it passes into
ARPAland.  (I'm not sure if "Reply-To" lines are similarly treated,
which is why I recommend they not be used unless you really know what
you're doing.)  If you're sending a carbon-copy to another ARPAnet
recipient, just put the ARPA address, i.e., "user@ARPAhost" in the "cc"
field without the uucp routing stuff.

The old standard permited multiple-word addresses; some sites, notably
Carnegie, actually use this.  Such usages tend to confuse UNIX mailers,
since they use blank as a delimiter (ARPA uses commas; these are
inserted at the gateway as well); besides, the new standard rules out
such nonsense.  I don't know about any other sites, but for CMU, you
can substitute periods for the blanks and everyone will be happy.