[comp.text.sgml] "richmail" internet draft

emv@msen.com (Ed Vielmetti) (06/22/91)
Content-type: text-plus/richtext

What follows is a draft of a proposal to create a format for
multimedia RFC-822 mail (and by extension multimedia Usenet News,
since they follow the same model.  It is notable for containing a
specification for a minimal <bold>SGML</bold>-compatible format for
"richtext", a text markup intended to be easy to parse and rich enough
to be useful for real applications.

Comments on the draft can go to the addresses below or to this group.
I've tried to type this message best I can in the style of the draft,
so that if you have a conformant implmentation you'll be able to click
on the box below and get a copy of the postscript version.

<x-signature>
Edward Vielmetti, MSEN Inc. 	moderator, comp.archives 	emv@msen.com
<x-snappy-signature-quote>
On the Net, the Net-way is best.
	It's just that we are trying to figure out what the Net-way is.
						e. miya
</x-snappy-signature-quote>
</x-signature>

--richmail-internet-draft
Content-Type: application/external-reference;
	name=/pub/nsb/BodyFormats.ps;
	real-type=text-plus/postscript;
	site=thumper.bellcore.com;
	expiration="23 Sep 1991 12:00:00 -0400"

--richmail-internet-draft

            INTERNET DRAFT

                      Mechanisms for Specifying and Describing
                       the Format of Internet Message Bodies


                           Nathaniel Borenstein, Bellcore
                                Ned Freed, Innosoft


                                     June 1991

          Status of This Memo

            This draft document will be submitted to the RFC editor as a
            protocol   specification.   Distribution  of  this  memo  is
            unlimited.  Please send  comments  to  Nathaniel  Borenstein
            <nsb@thumper.bellcore.com>.

            Experimentation with the mechanisms described in  this  memo
            is  encouraged.  It is anticipated that such experimentation
            will take place during the summer of 1991, after which a new
            draft  will  be  submitted to the RFC editor.  Comments that
            are intended to affect that future draft should be  received
            no later than September 23, 1991.

          Abstract

            This document suggests extensions to  the  RFC  822  message
            representation  protocol  to  allow  multi-part  textual and
            non-textual messages to be represented and exchanged without
            loss  of  information.    This  is  based  on  earlier  work
            documented in RFC 934 and RFC 1049, but extends and  revises
            that  work.   In  particular,  it  is designed to permit and
            standardize Internet mail mechanisms for  representing  text
            in   character  sets  other  than  US-ASCII,  for  including
            formatted  multi-font  text  messages,  for  including  non-
            textual material such as images and audio fragments, and for
            generally extending Internet mail to include  new  types  of
            objects  that are tagged in such a way that cooperating mail
            agents can recognize their types.






















            2               Internet Message Body Format  INTERNET DRAFT


          Contents

            1       Introduction
            2       The Content-Type Header Field
            3       The Content-Transfer-Encoding Header Field
            3.1     Quoted-Printable Content-Transfer-Encoding
            3.2     Base64 Content-Transfer-Encoding
            4       Additional Optional Content- Header Fields
            4.1     Optional Content-ID Header Field
            4.2     Optional Content-Description Header Field
            5       The Predefined Content-type Values
            5.1     The TEXT Content-type and the US-ASCII Character Set
            5.2     The "Multipart" Content-Type
            5.3     The "Text-Plus" Content-Type and "Richtext" subtype
            5.4     The Message Content-Type
            5.5     The Binary Content-Type
            5.6     The Application Content-Type Value
            5.7     The Audio, Image, and Video Content-Type Values
            5.8     Experimental ("X-") Content-Type Values
            6       Conformance With this Memo

            Appendix I -- Guidelines For Sending Data Via Email
            Appendix II -- A Complex Multipart Example
            Appendix III -- The US-ASCII Character Set
            Summary
            Contacts
            Acknowledgements
            References



































            INTERNET DRAFT  Internet Message Body Format               3


            1       Introduction

            Since its publication in 1982, RFC 822 [RFC-822] has defined
            the   standard  format  of  textual  mail  messages  on  the
            Internet.  Its success has been such that the RFC 822 format
            has  been  adopted,  wholly  or  partially,  well beyond the
            confines of the Internet and of SMTP transport,  as  defined
            by  RFC  821 [RFC-821].  As the format has seen wider use, a
            number of limitations have become  increasingly  problematic
            for the user community.

            RFC 822 was intended to specify a format for text  messages.
            As such, non-text messages, such as multimedia messages that
            might include audio or images,  are  simply  not  mentioned.
            Even in the case of text, however, RFC 822 is inadequate for
            the needs of email users whose languages require the use  of
            character  sets  richer  than  US ASCII [REF-ANSI]. For mail
            containing audio, video, Asian language text, or  even  text
            in  most European languages, RFC 822 does not specify enough
            to permit interoperability.

            One of the notable limitations of  RFC  821/822  based  mail
            systems  is  the  fact  that  they  limit  the  contents  of
            electronic  mail  messages  to  relatively  short  lines  of
            seven-bit  ASCII.   This  forces  a user to convert any non-
            textual data that she may wish to send into seven-bit  bytes
            representable  as printable ASCII characters before invoking
            her local mail UA (User Agent program).   Examples  of  such
            encodings  currently  used  in  the  Internet  include  pure
            hexadecimal, uuencode, the 3-in-4 base 64  scheme  specified
            in  RFC  1113,  the  Andrew Toolkit Representation, and many
            others.

            These limitations become even more apparent as gateways  are
            designed  to allow for the exchange of mail messages between
            RFC 822 hosts and X.400 hosts.  X.400  [REF-X400]  specifies
            mechanisms  for  the  inclusion  of  non-textual  body parts
            within electronic mail messages.  The current standards  for
            the  mapping  of  X.400 messages to RFC 822 messages specify
            that either X.400 non-textual body parts should be converted
            to  (not encoded in) an ASCII format, or that they should be
            discarded, notifying the RFC 822 user  that  discarding  has
            occurred.   This is clearly undesirable, as information that
            a user may wish to receive is lost.  Even though a user's UA
            may  not have the capability of dealing with the non-textual
            body part, the user might have some  mechanism  external  to
            the  UA  that  can  extract useful information from the body
            part.  Moreover, it does not allow for  the  fact  that  the
            message  may eventually be gatewayed back into an X.400 MHS,
            where the non-textual information  would  definitely  become
            useful again.

            This memo describes several mechanisms that combine to solve
            these problems.  In particular, it describes:









            4               Internet Message Body Format  INTERNET DRAFT


            1.  A Content-type header field, generalized from  RFC  1049
            [RFC-1049],  which  can  be  used  to  describe the type and
            subtype of data in the  body  of  a  message  and  to  fully
            specify the representation (encoding) of such data.

            2.  A Content-Transfer-Encoding header field, which  can  be
            used  to describe an auxilliary encoding that was applied to
            the data in order to allow  it  to  pass  through  the  mail
            transport layer.

            3.  A "text"  content-type  value,  which  can  be  used  to
            represent  text information in a number of character sets in
            a standardized manner.

            4.  A "multipart" content-type value, which can be  used  to
            combine  several  separate  body-parts, which may be made of
            different types of data, into a single message.

            5.  A "binary" content-type value,  which  can  be  used  to
            transmit uninterpreted or partially-interpreted binary data,
            and hence to implement an email file transfer service.

            6.  A "message" content-type value, for encapsulating a mail
            message.

            7.  Several additional  content-type  values  and  subtypes,
            which  can be used by consenting User Agents to interoperate
            with additional message types such  as  audio,  images,  and
            more.

            8.  Several optional header  fields  that  can  be  used  to
            further describe the data in a message body or body-part, in
            particular the Content-ID,  and  Content-Description  header
            fields.

            Finally,  to  specify  and  promote  a  minimal   level   of
            interoperability,  this memo describes a subset of the above
            mechanisms that defines "conformance" with this memo.   That
            is,   it  specifies  the  minimal  subset  required  for  an
            implementation to be called "XXXX-conformant."























            INTERNET DRAFT  Internet Message Body Format               5


            2       The Content-Type Header Field

            The Content-Type header field was first defined in RFC 1049.
            This  section  extends  and supersedes that definition.  RFC
            1049 content-types are all conformant  with  the  new,  more
            general  syntax.   (In  particular,  RFC  1049 content-types
            omitted the subtype/character-set specification, and  always
            had  at most two of the parts now called "parameters", which
            were distinguished by their position as indicating a version
            number and a resource reference.)

            The purpose of the content-type field  is  to  describe  the
            data  contained  in  the  message body fully enough that the
            receiving user  agent  can  pick  an  appropriate  agent  or
            mechanism  to  present the data to the user, or to otherwise
            deal with the data in an appropriate manner.

            The Content-Type  header field is used to specify  the  type
            of  data in a message, by giving a type name, and to provide
            auxiliary information  that  may  be  required  for  certain
            types.    In addition, a distinguished syntax is defined for
            specifying  subtype  information,  including  character  set
            information  in  the  case of text.  After the type name and
            the optional subtype, the remainder of the header  field  is
            simply  a  set  of  parameter specifications, as defined for
            each named type, and an optional comment.

            In the  Extended  BNF  notation  of  RFC-822,  we  define  a
            Content-type header field value as follows:

            Content-Type:= type ["/" subtype] *[";" parameter]
                            [comment]

            type := "TEXT" / "TEXT-PLUS" / "MESSAGE" / "AUDIO" / "IMAGE"
            / "VIDEO" /
                    "BINARY" / "APPLICATION"/ "MULTIPART" / "X-" token

            subtype := token

            parameter := token / quoted-string

            token := 1*<any CHAR except SPACE, CTLs, and tspecials>

            tspecial := "(" / ")" "<" / ">" / "@" / ","  / ":" / "/" /
                        "\" / <"> / "[" / "]" / ";"

            The type and subtype values are not case  sensitive.   TEXT,
            Text, and TeXt are all equivalent.

            An initial set of nine  content-types  is  defined  by  this
            memo.   This  set  of  top-level  names  is  intended  to be
            substantially complete.  It is expected  that  additions  to
            the   larger   set   of   supported  types  can  usually  be
            accomplished by  the  creation  of  new  subtypes  of  these









            6               Internet Message Body Format  INTERNET DRAFT


            initial  types.   In the future, more top-level types should
            be defined by an extension to this standard.

            The only constraint on the definition of  subtype  names  is
            the  desire that their uses not conflict.  That is, it would
            be undesirable  to  have  two  different  communities  using
            "Content-type:  binary/foobar" to mean two different things.
            The process of defining new content-subtypes, then,  is  not
            intended  to  be  a mechanism for imposing restrictions, but
            simply a mechanism for publicizing the usages.   There  are,
            therefore,   two  acceptable  mechanisms  for  defining  new
            content-type subtypes:

                 1.  Private values (starting  with  "X-")  may  be
                      defined  bilaterally  between two cooperating
                      agents   without    outside    approval    or
                      standardization

                 2.   "Standard"  values  may  be  defined  by  the
                      publication   of   an  Internet  RFC,  or  by
                      registering them with the  Internet  Assigned
                      Numbers  Authority (IANA) at ISI, by email to
                      IANA@ISI.EDU.

            The  nine  standard  initial  predefined  content-types  are
            detailed in the appendices of this memo.  They are:

                 text --  textual information, with character set  given
                      by the subtype
                 text-plus  -- mostly textual information, with embedded
                      formatting  commands.   A  simple  default type is
                      defined, with possible subtypes  including  troff,
                      TeX, and so on.
                 message  --  an  encapsulated  message,  with   initial
                      subtypes for partial messages and privacy-enhanced
                      messages
                 multipart -- a message consisting of multiple parts  of
                      independent  type  values,  with  initial  subtype
                      digest.
                 audio -- a message containing audio data, with  initial
                      subtypes a-law and u-law.
                 image -- a message containing image data, with  initial
                      subtypes G3fax, gif, pbm, ppm, and pgm.
                 video -- a message containing video data.
                 binary -- a  message  containing  some  other  form  of
                      binary data.
                 application  --  a  message  containing  data   to   be
                      processed by a mail-based application.

            If no  Content-type  header  field  is  present,  "text"  is
            generally to be assumed, with the default (US-ASCII) subtype
            as specified later in this memo.   This is  consistent  with
            the  default  message  body  type  as  defined  by  RFC 822.
            However,  this  does  not  mean  that  a  specification   of









            INTERNET DRAFT  Internet Message Body Format               7


            "Content-type:  text/us-ascii"  is optional.  In the absence
            of such a header field, it is impossible to be certain  that
            a  message  is  actually text in the US-ASCII character set,
            because  it  might  well  be  a  message  that,  using   the
            conventions  that  predate  this  memo, includes non-textual
            data in a manner that  cannot  be  automatically  recognized
            (e.g.  a  uuencoded  compressed  UNIX  tar file).   Although
            there is no acceptable alternative to treating such  untyped
            messages  as  "text/us-ascii",  implementors  should  remain
            aware that unless explicitly so marked, they may in practice
            be almost anything.

            It should be noted that  the  list  of  Content-type  values
            given  here  may  be  augmented  in time, via the mechanisms
            described above, and that the set of subtypes is expected to
            grow substantially.  We have simply attempted, in this memo,
            to give as many standard  Content-type  definitions  as  was
            possible given the current state of our knowledge.













































            8               Internet Message Body Format  INTERNET DRAFT


            3       The Content-Transfer-Encoding Header Field

            Many content-types which are desired to transport via e-mail
            are   represented,  in  their  "natural"  format,  as  8-bit
            character or binary data.  Such data  can  not  be  properly
            transmitted  over  existing Internet mail mechanisms because
            both RFC 821 and RFC 822 restrict  mail  messages  to  7-bit
            US-ASCII data with 1000 character lines.

            It is necessary, therefore, to extend the definition of  the
            data types allowed in the RFC 821 and RFC 822 framework, and
            to define a standard mechanism for encoding such data in  an
            acceptable manner.

            This memo specifies that such encodings will be indicated by
            a   new   "Content-Transfer-Encoding"   header  field.   The
            Content-Transfer-Encoding field is used to indicate the type
            of  transformation  that  has  been  used  to  represent the
            message body in an acceptable manner.

            It may seem  that  the  Content-Transfer-Encoding  could  be
            inferred  from  the characteristics of the Content-Type that
            is to be encoded, or, at the very  least,  certain  Content-
            Transfer-Encodings  could  be mandated for use with specific
            Content-Types. There are several reasons why this is not the
            case.  First, given the varying types of transports used for
            mail, some encodings may be appropriate  for  some  Content-
            Type/transport  combinations  and  not  for  others. Second,
            certain  Content-Types  may  require  different   types   of
            transfer   encoding   under   different  circumstances.  For
            example, many PostScript messages might consist entirely  of
            short  lines  of  7-bit  data and hence require little or no
            encoding. Other PostScript messages (especially those  using
            Level  2 PostScript's binary encoding mechanism) may only be
            resonably represented using  a  binary  transport  encoding.
            Finally,  since Content-Type is intended to be an open-ended
            specification  mechanism,   strict   specification   of   an
            association  between Content-Types and encodings effectively
            couples the specification of an application protocol with  a
            specific  lower-level transport. This is not desirable since
            the developers of a Content-Type should not have to be aware
            of all the transports in use and what their limitations are.

            It  should  be  noted,  also,  that  there  is  considerable
            interest   and  effort  being  expended  on  extending  mail
            transport  to  permit  8-bit  or  binary  data.    If   such
            extensions  ever  become  commonplace, the Content-Transfer-
            Encoding mechanism will quickly become irrelevant, and it is
            therefore  desirable  not  to  "overload"  Content-Transfer-
            Encoding with additional  mechanisms  that  might  still  be
            useful   in  such  a  future.   For  this  reason,  Content-
            Transfer-Encoding is restricted in its  scope  to  refer  to
            nothing  but  the  7-bit encoding question.  Matters such as
            the basic format in which information is "encoded" are to be









            INTERNET DRAFT  Internet Message Body Format               9


            handled by other mechanisms.

            Unlike Content-types, which are expected to proliferate,  it
            is  expected  that  there  will  never  be  more  than a few
            different  Content-Transfer-Encoding  values,  both  because
            there  is  less need for variation and because the effect of
            variation  in  Content-Transfer-Encoding   would   be   more
            problematic.   However,  establishing only a single Content-
            Transfer-Encoding mechanism  does  not  seem  possible.   In
            particular,  there  is  a  tradeoff between the desire for a
            compact and efficient encoding of binary data and the desire
            for  a  readable  encoding  of  data that is mostly, but not
            entirely,  7-bit  data.   For  this  reason,  at  least  two
            encoding mechanisms are necessary, a "readable" encoding and
            a "dense" encoding.

            A third encoding, for compressed  ("super-dense")  data,  is
            also viewed by many as desirable. This memo does not specify
            a "compressed" encoding, due largely to the uncertain  legal
            state   of  the  UNIX  "compress"  command  and  a  lack  of
            certainty, during the drafting of this memo,  regarding  the
            right way to define a standard compression algorithm.  It is
            hoped that a compressed  Content-Transfer-Encoding  will  be
            defined  in a future RFC. Any compression algorithm for such
            a use should be  unambiguously  defined  and  without  legal
            encumbrances.   (Alternate  mechanisms  for compression have
            also been proposed, and might be defined in  ways  that  are
            compatible with this memo.)

            The Content-Transfer-Encoding field is designed  to  specify
            an invertible mapping between the "native" representation of
            a type of data and a  representation  that  can  be  readily
            exchanged using 7 bit mail transport protocols as defined by
            RFC 821 (SMTP). This field  has  not  been  defined  by  any
            previous  RFC. The field's value is a single atom specifying
            the type of encoding, as enumerated below.  Formally:

            Content-Transfer-Encoding:=     "BASE64"/
                                            "QUOTED-PRINTABLE"/
                                            "8BIT"/
                                            "BINARY"/
                                            "7BIT"/
                                            "X-"atom

            These values are not case sensitive.  That  is,  Base64  and
            BASE64  and  bAsE64 are all equivalent.  An encoding type of
            7BIT implies that the message  is  already  in  a  seven-bit
            mail-ready  representation.  This  value  is  assumed if the
            Content-Transfer-Encoding header field is not  present.   If
            the  message  is  stored or transported via a mechanism that
            permits 8-bit data, a  Content-Transfer-Encoding  of  "8bit"
            should be used.  If the message is stored or transported via
            a mechanism that permits arbitary binary  data,  a  Content-
            Transfer-Encoding  of  "binary" may nonetheless be used.  In









            10              Internet Message Body Format  INTERNET DRAFT


            particular, "8bit" or "binary" must  be  used  in  the  case
            where  there  is  a  possibility that the message may "leak"
            into  a  more  restricted  (7-bit)  transport   environment.
            (DISCUSSION:   The distinction between the Content-Transfer-
            Encoding values of "binary," "8bit,"  and  "7bit"  may  seem
            unimportant  in  an  8-bit  binary  environment,  but  clear
            labeling will be of enormous value to gateways between 8-bit
            and   7-bit  systems.  The  difference  between  "8bit"  and
            "binary" is that "8bit" implies adherence to SMTP limits  on
            line length and CR/LF semantics, whereas "binary" does not.)

            Implementors  may,  if  necessary,   define   new   Content-
            Transfer-Encoding  values,  but should prefix them with "x-"
            to  indicate  their  non-standard  status,  e.g.   "Content-
            Transfer-Encoding:   x-my-new-encoding".    However,  unlike
            Content-types and subtypes, the  creation  of  new  Content-
            Transfer-Encoding  values  is  explicitly discouraged, as it
            seems  likely  to  hinder   interoperability   with   little
            potential benefit.

            If a Content-Transfer-Encoding header field appears as  part
            of  a message header, it applies to the entire message body,
            whether or not that body is of type "multipart." If it is of
            type  multipart,  the encoding applies recursively to all of
            the encapsulated parts, including their encapsulated headers
            and  the  encapsulation  boundaries.  If a Content-Transfer-
            Encoding header field appears as part of an  encapsulation's
            headers,  it  applies  only  to the body of the encapsulated
            part.   If  the  encapsulated  part  is   itself   of   type
            "multipart",  the encoding applies recursively to all of the
            encapsulated parts within that encapsulated part.

            It  should  be  noted  that,  because  email  is  character-
            oriented,  the  mechanisms described here are mechanisms for
            encoding arbitrary byte streams, not bit streams.  If a  bit
            stream  is  to  be  encoded  via one of these mechanisms, it
            should first be converted to a byte stream using the network
            standard bit order ("big-endian"), in which the earlier bits
            in a stream become the higher-order bits in a byte.   A  bit
            stream not ending at an 8-bit boundary should be padded with
            zeroes.  This RFC does not provide a  mechanism  for  noting
            the  addition of such padding; this information could either
            be encoded into the data stream or noted in some  additional
            header field.

            The following sections will define the two standard encoding
            mechanisms.

            3.1     Quoted-Printable Content-Transfer-Encoding

            The Quoted-Printable encoding is intended to represent  data
            that  largely contains octets less than 127.  It encodes the
            data in such a  way  that  the  resulting  octets  are  both
            unlikely to be modified by mail transport, and, when read as









            INTERNET DRAFT  Internet Message Body Format              11


            ASCII text, are largely recognisable by humans.   A  message
            which  is  entirely  ASCII  may  also  be encoded in Quoted-
            Printable to insure it's survival in an environment which is
            anticipated to traverse a character translating gateway such
            as those onto BITNET.

            In this encoding, ASCII characters  33  (EXCLAMATION  POINT)
            through  57 (DIGIT 9), inclusive, 59 (SEMICOLON) through 126
            (TILDE), inclusive, may be represented as  themselves.   All
            other   characters,  including  characters  32  (SPACE),  58
            (COLON), 127 (DEL), and all control characters,  are  to  be
            represented as determined by the following rules:

                 Rule #1:  Any 8 bit value may be represented by  a  ":"
                 followed  by  a two digit hexadecimal representation of
                 the  character's  8-bit  value.   Thus,  for   example,
                 character   12   (control-L,   or   formfeed)   can  be
                 represented by ":0C",  and  the  colon  character  (58)
                 itself  can  be  represented  by  ":3A".   Rule  #1  is
                 optional for characters 10 (control-J, or linefeed), 13
                 (control-M,  or  return),  and  32  (SPACE) through 126
                 (TILDE), and is required for all other characters.

                 Rule #2:  The literal colon character  must  itself  be
                 quoted  by  a  colon  (i.e., as "::") if Rule #1 is not
                 used.  Note that this is not ambiguous with  regard  to
                 Rule  #1,  because  ":"  is not part of the hexadecimal
                 alphabet.

                 Rule #3:  A colon at the end of a line may be  used  to
                 indicate a non-significant line break.  That is, if one
                 needs to include a long line  without  line  breaks,  a
                 message  encoded  with  the  quoted-printable  encoding
                 should include "soft" line breaks  in  which  the  line
                 break  is  preceded by a colon.  Thus if the "raw" form
                 of the line is a single line that says:

                 Now's the time for all men to come to the aid of  their
                 country.

                 This could  be  represented,  in  the  quoted-printable
                 encoding, as

                 Now's the time :
                 for all men to come:
                  to the aid of their country.

                 This provides a mechanism with  which  long  lines  are
                 encoded  in  such  a  way as to be restored by the user
                 agent.    The quoted-printable encoding  REQUIRES  that
                 lines  be  broken  so  that  they  are  no more than 78
                 characters long, using soft line breaks when necessary.











            12              Internet Message Body Format  INTERNET DRAFT


                 Rule  #4:   SPACE  (32)  characters  may  generally  be
                 represented   as  themselves,  but  should  NOT  be  so
                 represented at the end of a line,  because  some  MTA's
                 are  known  to  remove  "white space" from the end of a
                 line.   In  such  cases,   the   characters   MUST   be
                 represented  as in rule #1 (as ":20") or as themselves,
                 followed by a soft line break followed by a  real  line
                 break.    Of   course,   these  characters  may  be  so
                 represented within a line as well, if this is  desired,
                 though  this is less readable.  Note that in decoding a
                 quoted-printable message, any trailing white space on a
                 line  should  be  deleted,  as it will necessarily have
                 been added by intermediate transport agents.

                 Rule #5: A CR LF pair normally constitutes a line break
                 and  should  be  represented  by  a  line  break in the
                 quoted-printable  encoding  if  that  is  its  meaning.
                 Isolated   CRs,  LFs,  and  LF  CR  sequences  must  be
                 represented using the :0D, :0A,  and  :0A:0D  notations
                 respectively.  CR LF sequences that are not intended to
                 represent a line break should be encoded as  :0D:0A  to
                 reflect  this  usage.  In other words, the concept "end
                 of  line"  is  represented,  in  the   quoted-printable
                 encoding,  by  CR  LF, although this may be modified in
                 local storage formats.  Literal occurrences of CR or LF
                 that  do  not  occur  as  CRLF  or  are not intended to
                 represent end-of-line markers must  be  represented  in
                 hexadecimal.

            Since the hyphen character ("-") is represented as itself in
            the Quoted-Printable encoding, the usual care must be taken,
            when encapsulating a quoted-printable  encoded  message   or
            body  part  in  a  multipart  message,  to  ensure  that the
            encapsulation boundary  does  not  appear  anywhere  in  the
            message.   See the definition of multipart messages later in
            this memo.

            It  should  be  noted  that  the  quoted-printable  encoding
            represents something of a compromise between readability and
            reliability in transport.  Message bodies encoded  with  the
            quoted-printable  encoding will work reliably over most mail
            gateways, but may not work perfectly over  a  few  gateways,
            notably   those  involving  translation  into  EBCDIC.   (In
            theory, an EBCDIC gateway could  decode  a  quoted-printable
            message  and re-encode it using base64, but such gateways do
            not yet exist.)  A higher level of confidence is offered  by
            the  base64 Content-Transfer-Encoding.  For more information
            about how to ensure  that  messages  are  safe  against  the
            vagaries of mail gateways, see Appendix I.

            3.2     Base64 Content-Transfer-Encoding

            The  Base64   Content-Transfer-Encoding   is   designed   to
            represent arbitrary 8 bit data in a form that is not humanly









            INTERNET DRAFT  Internet Message Body Format              13


            readable.  The encoding and decoding algorithms are  simple,
            but  the  encoded  data is only about 33 percent larger than
            the unencoded data.  This encoding is based on the one  used
            in  Privacy  Enhanced  Mail  applications, as defined in RFC
            1113.   The base64 encoding is adapted from RFC  1113,  with
            two   changes:   base64  elminates  the  "*"  mechanism  for
            embedded clear text and defines a new  syntax  for  portable
            end-of-line markers, using the comma character.

            A 66-character subset of International Alphabet IA5 is used,
            enabling  6  bits to be represented per printable character.
            (The extra 65th and 66th characters "=" and "," are used  to
            signify  special  processing functions.) This subset has the
            important property that it is  represented  identicially  in
            IA5  and ASCII, and all characters in the subset are part of
            the so-called invariant  subset  of  EBCDIC.  Other  popular
            encodings  such as the encoding used by the UUENCODE utility
            and the  base85  encoding  specified  as  part  of  Level  2
            PostScript  do  not  share these properties, and thus do not
            fulfill the portability requirements  imposed  on  a  binary
            transport encoding for mail.

            The encoding process represents 24-bit groups of input  bits
            as  output  strings of 4 encoded characters. Proceeding from
            left to right across a  24-bit  input  group  is  formed  by
            concatenating  3 8-bit input groups, this is then treated as
            4 concatenated 6-bit groups.  When encoding a bit stream via
            the base64 encoding, the bit stream should be presumed to be
            ordered with the most-significant-bit first.  That  is,  the
            first  bit  in  the stream will be the high-order bit in the
            first byte, and the eighth bit with be the low-order bit  in
            the first byte, and so on.

            Each 6-bit group is used as an index into  an  array  of  64
            printable  characters. The character referenced by the index
            is placed in the output string. These characters, identified
            in  Table  1,  below,  are  selected so as to be universally
            representable,  and  the  set   excludes   characters   with
            particular  significance to SMTP (e.g., ".", "CR", "LF") and
            to the encapsulation boundaries defined in this  RFC  (e.g.,
            "-").






















            14              Internet Message Body Format  INTERNET DRAFT


                                      Table 1

               Value Encoding  Value  Encoding   Value  Encoding   Value
            Encoding
                   0 A            17 R            34 i            51 z
                   1 B            18 S            35 j            52 0
                   2 C            19 T            36 k            53 1
                   3 D            20 U            37 l            54 2
                   4 E            21 V            38 m            55 3
                   5 F            22 W            39 n            56 4
                   6 G            23 X            40 o            57 5
                   7 H            24 Y            41 p            58 6
                   8 I            25 Z            42 q            59 7
                   9 J            26 a            43 r            60 8
                  10 K            27 b            44 s            61 9
                  11 L            28 c            45 t            62 +
                  12 M            29 d            46 u            63 /
                  13 N            30 e            47 v
                  14 O            31 f            48 w         (pad) =
                  15 P            32 g            49 x         (eol) ,
                  16 Q            33 h            50 y

            The output stream (encoded bytes)  must  be  represented  in
            lines  of  no more than 76 characters each.  All line breaks
            in the encoded version of the  data  should  be  ignored  by
            decoding software.

            Special processing is performed if fewer than  24  bits  are
            available  at the end of a message or encapsulated part of a
            message.  A full encoding quantum is always completed at the
            end  of  a  message.  When  fewer  than  24  input  bits are
            available in an input group, zero bits  are  added  (on  the
            right)  to  form an integral number of 6-bit groups.  Output
            character positions which  are  not  required  to  represent
            actual  input  data are set to the character "=".  Since all
            canonically encoded output is an integral number of  octets,
            only the following cases can arise: (1) the final quantum of
            encoding input is an integral multiple of 24 bits; here, the
            final unit of encoded output will be an integral multiple of
            4 characters with no "=" padding, (2) the final  quantum  of
            encoding  input  is  exactly 8 bits; here, the final unit of
            encoded output will be two characters followed  by  two  "="
            padding  characters,  or  (3)  the final quantum of encoding
            input is exactly 16 bits; here, the final  unit  of  encoded
            output  will be three characters followed by one "=" padding
            character.

            One addition is made to the RFC 1113 specification  of  this
            encoding:   The  comma character (",", ASCII 44) may be used
            to represent an "end-of-line" or "end-of-record" marker.  If
            line-oriented data are encoded using base64, it is desirable
            to  restore  end-of-line  markers  according  to  the  local
            convention.   The  RFC  1113  specification, as given above,
            offers  no  way  to  differentiate  between  a  binary  file









            INTERNET DRAFT  Internet Message Body Format              15


            including a CRLF sequence and a portable end-of-line marker.
            This  memo  augments   that   mechanism   to   permit   such
            differentiation,  as  follows.   To represent an end-of-line
            marker:

                 1.  Treat the byte stream  preceding  the  end-of-
                 line as terminating with at the end of the line --
                 that is, pad with "=" characters as appropriate to
                 complete the representation of the line.

                 2.  Insert a comma character.

                 3.  Resume the  encoding  starting  a  new  24-bit
                 input  group  with the first character on the next
                 line.

            Thus, while encoding the sequence "a-b-c-CR-LF-a-b-c"   (or,
            in hexadecimal, "61 62 63 0D 0A 61 62 63") yields the octets
            which are represented in ASCII as  "YWJjDQphYmM=",  encoding
            "a-b-c"  followed  by an end-of-line followed by "a-b-c" (in
            hex, "61 62 63" end-of-line "61 62 63")  yields  "YWJj,YWJj"
            They  will  be  translated  back  into the same thing if the
            local end-of-line convention is  ASCII  "CRLF"  (hexadecimal
            "0D  0A"),  but  they will be translated back differently if
            the end-of-line convention is anything other than ASCII CRLF
            (hexadecimal  "0D 0A").  (Note:  The utliity of the portable
            end-of-line feature is somewhat limited.  Most line-oriented
            data  are best represented by the quoted-printable encoding.
            A few cases, however, might  benefit  from  this  mechanism,
            notably  line-oriented  textual  data in character sets that
            bear no resemblance to ASCII.)

            Note: There is no  need  to  worry  about  quoting  apparent
            encapsulation  boundaries  within  base64-encoded  parts  of
            multipart messages, because no hyphen characters are used in
            the base64 encoding.



























            16              Internet Message Body Format  INTERNET DRAFT


            4       Additional Optional Content- Header Fields

            4.1     Optional Content-ID Header Field

            In constructing a high-level user agent, it may be desirable
            to allow one message body-part to make reference to another.
            This may be done using the "Content-ID" header field,  which
            is syntactically identical to the "Message-ID" header field:

            Content-ID := "<" msg-id ">"

            4.2     Optional Content-Description Header Field

            It  may  be  desirable   to   associate   some   descriptive
            information  with  a given body-part. For example, it may be
            useful to mark an "image" body-part as  "a  picture  of  the
            Space  Shuttle  Endeavor."   Such  text may be placed in the
            Content-Description header field.

            Content-Description := *text











































            INTERNET DRAFT  Internet Message Body Format              17


            5       The Predefined Content-type Values

            This memo defines nine initial content-type  values  and  an
            extension  mechanism  for  private  or  experimental  types.
            Further types must be defined and published by  a  new  RFC.
            It  is  expected  that  most innovation in new types of mail
            take place as subtypes of the nine types defined here.

            5.1     The TEXT Content-type and the US-ASCII Character Set

            The text content-type is intended for sending textual email.
            It  is the default content-type. Subtype names are used, for
            text, to indicate character sets.  The default  content-type
            for  internet  mail  is  "text",  and  the  default  subtype
            (character set) is "US-ASCII".

            Alternately,  a  different  character  set  subtype  may  be
            specified,  in  which case the body text is in the specified
            character set.  A recommended  list  of  predefined  subtype
            names can be found at the end of this section.  Note that if
            the  specified  character  set  includes  8-bit  data,   the
            Content-Transfer-Encoding  header field is required in order
            to transmit the message via SMTP.

            The default character set, US-ASCII, has been the subject of
            some  confusion  and  ambiguity  in the past.  Not only were
            there some ambiguities in the definition,  there  have  been
            wide  variations  in  practice.   In  order to elminate such
            ambiguity and variations  in  the  future,  it  is  strongly
            recommended  that  new  user  agents  explicitly  specify  a
            character set via the content-type header field.

            US-ASCII is not an arbitrary seven-bit character  code,  but
            indicates  that  the message body uses character coding that
            uses  the  exact  correspondence  of  codes  to   characters
            specified  in  ASCII.   National  use  variations  of ISO646
            [REF-ISO646] are not ASCII, and neither an explicit  "ASCII"
            character  set, nor "US-ASCII", nor the default (omission of
            a character set) should be used when  characters  are  coded
            using  them.   (Discussion: RFC821 very explicitly specifies
            "ASCII", and references  an earlier version of the  American
            Standard  cited  in [REF-ANSI].  Whether that specification,
            rather than a reference to an  International  Standard,  was
            done  deliberately or out of convenience or ignorance, is no
            longer interesting:  insofar  as  one  of  the  purposes  of
            specifying a content-type and character set is to permit the
            receiver to unambiguously determine how the sender  intended
            the coded message to be interpreted, assuming anything other
            than "strict ASCII" as the default would risk  unintentional
            and  incompatible  changes  to the semantics of messages now
            being  transmitted.     This  also  implies  that   messages
            containing   characters   coded   according    to   national
            variations on ISO646,  or  using  code-switching  procedures
            (e.g.,  those  of  ISO2022),  as  well  as 8-bit or multiple









            18              Internet Message Body Format  INTERNET DRAFT


            octet character encodings MUST use an appropriate  character
            set specification to be consistent with this specification.)

            The complete US-ASCII character set is  listed  in  Appendix
            III.   Note  that  the  control characters (0-31) and delete
            (127) have no defined meaning  apart  from  the  combination
            <CR><LF>  (ASCII  values  13  and 10) indicating a new line.
            Two of the characters have de facto meanings  in  wide  use:
            <FF>  (ASCII   12)  as  the  first character of a line means
            "start this line on the beginning of a new page"; and  <TAB>
            (ASCII  9)  means  "move  the  cursor  to the next available
            position 8n+1 after the next postion". Apart from  this  any
            use  of  the  control characters or DEL in a message must be
            part  of  a  private  agreement  between  the   sender   and
            recipient.  Such  private  agreements  are  discouraged  and
            should be replaced by the other capabilities of this memo."


            Beyond US-ASCII, one can imagine an  enormous  proliferation
            of character sets.  It is the opinion of the authors of this
            memo that a large number of character sets  is  NOT  a  good
            thing.   We  would  prefer to specify a single character set
            that can be used universally for  representing  all  of  the
            world's  languages in electronic mail.  Unfortunately, there
            is no clear choice for such a universal representation,  and
            existing  practice  in several communities seems to point to
            the continuing use of multiple character sets  in  the  near
            future.  For this reason, we define names for a small number
            of character sets for which a strong consituent base exists.
            We recommend the use of ISO-10646 wherever possible.

            The defined subtypes of text, which name alternate character
            sets, are:

                 US-ASCII -- as defined above.

                 ISO-8859-X -- where "X"  is  to  be  replaced,  as
                      necessary,  for  the national use variants of
                      ISO-8859 [REF-ISO-8859].  Note that the  ISO-
                      646  character  sets  have  deliberately been
                      omitted in favor of their 8859  replacements,
                      which  are  the designated character sets for
                      Internet mail.  The ISO-8859  character  sets
                      will  be  rigorously defined, for use in mail
                      and other applications, by a forthcoming RFC.

            Note that the character set used should always be explicitly
            specified in the Content-type field.

            The following three subtypes of  text  are  expected  to  be
            defined   by   forthcoming   documents.  Their  use  is  not
            recommended in advance of those publications:











            INTERNET DRAFT  Internet Message Body Format              19


                 ISO-10646 -- as defined in [REF-ISO-10646].   This
                      standard   is   not,   as  of  this  writing,
                      finalized, and therefore its  use  for  email
                      cannot be fully specified.

                 ISO-2022 -- ISO-2022 -- ISO-2022,  as  defined  in
                      [REF-ISO-2022],  is  problematic for mail use
                      because  it  actually   specifies   ways   of
                      designating  and   accessing  character sets,
                      rather than, itself, being a  character  set.
                      Its  use  in  mail  will probably be strongly
                      desired by communities who are already  using
                      it   locally   to  handle  multiple  sets  of
                      characters  and  multi-byte  characters.   It
                      appears  necessary  to explicitly specify the
                      ISO-2022 methods that will  be  permitted  in
                      text mail so as to avoid the need for private
                      agreements   about,   e.g.,   the    specific
                      character  sets being used in messages. It is
                      expected that those  interested  in  ISO-2022
                      mail   will   devise   and   publish  such  a
                      specfication in the future.

                 QUOTED-READABLE -- A format for representing  text
                      in  multiple  character  sets,  as defined in
                      [REF-RFC-QR].

            Implementors are discouraged  from  defining  new  character
            sets for mail use unless absolutely necessary.

            The intent of "text" is to represent "unformatted"  text  in
            an  appropriate  character  set.   Formatted  text,  such as
            multi-font text, should use the "text-plus" content-type.






























            20              Internet Message Body Format  INTERNET DRAFT


            5.2     The "Multipart" Content-Type

            In  the  case  of  multiple  part  messages,  a  "multipart"
            Content-type  field  should  appear  in  the RFC 822 message
            header. The message body is then assumed to contain multiple
            parts  separated  by  encapsulation boundaries.  Each of the
            parts is defined, syntactically, as a header area,  a  blank
            line,  and  a body area, similar to the RFC 822 syntax for a
            message.  However body parts are NOT to  be  interpreted  as
            actually  being  RFC 822 messages.  To begin with, NO header
            fields are actually required in body  parts.   A  body  part
            that starts with a blank line, therefore, is a body part for
            which all default values are to be assumed.  In such a case,
            of  course,  the  absence  of  a  Content-type  header field
            implies that the encapsulation is US-ASCII text.   The  only
            header  fields  that have defined meaning for body-parts are
            those the names of which begin with  "Content-".  All  other
            header fields are generally to be ignored in body-parts, and
            may be discarded by gateways.  They are permitted to  appear
            in  body  parts only for ease of conversion between messages
            and body parts.  Of course, "X-" field may  be  created  for
            experimental  or private purposes, with the recognition that
            the information they contain may be lost at some gateways.

            It must be understood that body parts are NOT messages.  For
            example,  a  gateway between Internet and X.400 mail must be
            able to  tell  the  difference  between  a  body  part  that
            consists  of  an  image  and  a bodypart that consists of an
            encapsulated message, the body of which  is  an  image.   In
            order  to  represent  the  latter, the body part should have
            "Content-type: message", and its body (after the blank line)
            should  be  the encapsulated message, with its own "Content-
            type: image" header field.  Body parts use the  same  syntax
            as messages because there are many legitimate cases in which
            a body part might be  converted  into  a  message,  or  vice
            versa.   The  identical  syntax makes such conversions easy,
            but must be understood by implementors.   (For  the  special
            case  in  which  all parts are actually messages, a "digest"
            subtype is also defined.)

            As stated previously, each pair of  consecutive  body  parts
            are   separated   by   an   encapsulation   boundary.    The
            encapsulation boundary MUST NOT appear  inside  any  of  the
            encapsulated  parts.  Thus, it is crucial that the composing
            agent be able to choose and specify the boundary  that  will
            separate the parts.

            The Content-type field for multipart  messages requires  two
            supplementary fields. The first is used to specify a version
            number and must be either "1-S" and "1-P". The two  versions
            have  identical  syntax, but the "-P" is intended as a hint,
            to receivers, that the parts are intended to  be  viewed  in
            parallel  rather  than  sequentially.   Implementations that
            can not show the parts in parallel, or that choose not to do









            INTERNET DRAFT  Internet Message Body Format              21


            so,  are free to treat all multipart messages of version "1-
            P"  as  if  they   were   version   "1-S".    However,   all
            implementations  must  check  the  version number, to ensure
            graceful behavior in the event that an  incompatible  future
            version  of  multipart  messages  is  defined later.  Future
            version numbers will always start with an  integer  for  the
            primary  version number, followed by a hyphen and (possibly)
            some additional text.

            The second parameter, which is always required for multipart
            messages, is used to specify the format of the encapsulation
            boundary.  The encapsulation boundary is defined as  a  line
            consisting  entirely  of two hyphen characters ("-", decimal
            code 45) followed by the second parameter  of  the  Content-
            type  header  field with any leading or trailing white space
            removed.  (DISCUSSION:  The specification that  white  space
            be   removed   is   intended   to   eliminate  the  possible
            introduction of ambiguity caused by the addition or deletion
            of white space by message transport agents.  The hyphens are
            for rough compatibility with the earlier RFC 934  method  of
            message  encapsulation,  and  for  ease of searching for the
            boundaries in some implementations.  However, it  should  be
            noted  that multipart messages are NOT completely compatible
            with RFC 934 encapsulations; in particular, they do not obey
            RFC  934  quoting  conventions for embedded lines that begin
            with hyphens.)

            Thus, a typical multipart content-type  header  field  might
            look like this:

            Content-type: multipart; 1-S; gc0p4Jq0M2Yt08jU534c0p

            This indicates that the message consists of  several  parts,
            each  itself  structured  as  an  RFC 822 message, which are
            intended to be viewed one-at-a-time, and that the parts  are
            separated by the line

            --gc0p4Jq0M2Yt08jU534c0p

            The encapsulation boundaries  must  not  appear  within  the
            encapsulations,  and should be no longer than 70 characters,
            not counting the two leading hyphens.

            The encapsulation  boundary  following  the  last  body-part
            should  be  a distinguished delimiter that indicates that no
            further  body-parts  will  follow.   Such  a  delimiter   is
            identical  to  the previous delimiters, with the addition of
            two more hyphens at the end of the line:

            --gc0p4Jq0M2Yt08jU534c0p--

            It should be  noted  that  there  appears  to  be  room  for
            additional  information  prior  to  the  first encapsulation
            boundary and following the final such boundary.  For several









            22              Internet Message Body Format  INTERNET DRAFT


            reasons, however, it is specified that these areas should be
            left blank, and that implementations should ignore  anything
            that  appears  before  the  first boundary or after the last
            one. (The most important reasons  are  the  lack  of  proper
            typing  of  these  parts  and  lack  of  clear semantics for
            handling  these  parts  at  gateways,   particularly   X.400
            gateways.)

            The use of  "Content-Type:  Multipart"  as  a  message  part
            within   another  "Content-Type:  Multipart"  is  explicitly
            allowed.   In such cases, for obvious reasons, care must  be
            taken  to  ensure  that each nested mulitpart message should
            use a different boundary delimiter.  See Appendix II for  an
            example of nested multipart messages.

            The use of  content-type  "Multipart"  with  only  a  single
            included  part  may  be  useful  in certain contexts, and is
            explicitly permitted.

            Overall, the body of a multipart message may be specified as
            follows:

            body := 1*encapsulation close-delimiter

            encapsulation := delimiter CRLF message

            delimiter := "--" <delimiter from Content-type resource>

            close-delimiter := delimiter "--"

            message = <as defined in RFC 822, with all header fields
                      optional, and with the specified delimiter not
                      occurring anywhere in the body, either on a line
                      by itself or as a substring anywhere.>

            The above description defines the  default  subtype  of  the
            multipart  type,  "mixed", which may be explicitly specified
            with a content-type of "multipart/mixed".    Other  subtypes
            are  possible,  but  should  be  defined to be syntactically
            compatible with the "mixed" subtype.  Unrecognized  subtypes
            should  be treated as being of subtype "mixed." (DISCUSSION:
            Conspicuously missing from the multipart type is a notion of
            structure.   In  general,  it  seems  premature  to  try  to
            standardize structure yet.  It  is  recommended  that  those
            wishing to provide a more structured or integrated multipart
            messaging facility should define a subtype of multipart that
            is  syntactically  identical,  but  that  always expects the
            inclusion of a distinguished part (e.g. with a  content-type
            of "Application/x-my-structure-subtype") that can be used to
            specify the structure and integration of  the  other  parts,
            probably  referring  to  them by their Content-ID field.  If
            this  approach  is  used,  other  implementations  will  not
            recognize  the  subtype,  but  will  treat it as the default
            subtype (multipart/mixed) and will thus be able to show  the









            INTERNET DRAFT  Internet Message Body Format              23


            user the parts that are recognized.)

            This memo defines one particular subtype of  multipart,  the
            "digest"  subtype.   This type is syntactically identical to
            multipart, but the semantics are different.  In  particular,
            in  a  digest,  all  of  the parts are assumed to be of type
            "Message".  That is, each part is implicitly prefixed  by  a
            line  that  says "Content-type: message" followed by a blank
            line.  This is provided in order to allow  a  more  readable
            digest  format  that  is  largely compatible (except for the
            quoting convention) with RFC 934.




















































            24              Internet Message Body Format  INTERNET DRAFT


            5.3     The "Text-Plus" Content-Type and "Richtext" subtype

            There are many formats for representing what might be  known
            as  "extended  text"  --  text  with embedded formatting and
            presentation information.  An interesting characteristic  of
            most  such  representations  is that they are to some extent
            readable even without the software that interprets them.  It
            is  useful, then, to distinguish them, at the highest level,
            from such non-readable data as images or audio messages.  In
            the  absence  of  appropriate  interpreting  software, it is
            reasonable to show extended text to the user,  while  it  is
            not reasonable to do so with binary data.

            To represent such data,  this  memo  defines  a  "text-plus"
            content-type.  Plausible subtypes of text-plus are typically
            given by the common name of the representation format,  e.g.
            "text-plus/Troff"  or  "text-plus/TeX".   Character sets are
            not specified as subtypes; in general it is assume that rich
            text formats will have their own mechanisms for representing
            alternate or multiple character sets.   However,  a  subtype
            can  be  defined to permit such a specification, e.g. "text-
            plus/troff; charset=ISO-8859-1".  Initial  subtypes  include
            troff, TeX, and PostScript

            In order to promote the  wider  interoperability  of  simple
            formatted  text,  this  memo  defines  an  extremely  simple
            subtype  of  "text-plus",  the  "richtext"  subtype.    This
            subtype was designed to meet the following criteria:

                 1.  The syntax is extremely simple  to  parse,  so
                 that   even  teletype-oriented  mail  systems  can
                 easily strip away the formatting  information  and
                 leave only the readable text.

                 2.  The syntax is easily extended to allow for new
                 formatting commands that are deemed essential.

                 3.  The capabilities  are  extremely  limited,  to
                 ensure  that  it  can  represent  no  more than is
                 likely to be representable by the  user's  primary
                 word  processor.   While  this  limits what can be
                 sent, it increases the likelihood that it  can  be
                 properly displayed.

                 4.  The syntax is compatible with SGML,  so  that,
                 with an appropriate DTD (Document Type Definition,
                 the standard mechanism  for  defining  a  document
                 type  using  SGML), a general SGML parser could be
                 made to parse  richtext.   (However,  richtext  is
                 several  orders  of  magnitude  simpler  than full
                 SGML, and no SGML knowledge is required  in  order
                 to understand the richtext specification.)











            INTERNET DRAFT  Internet Message Body Format              25


            The syntax of "richtext" is very simple.  It is assumed,  at
            the  top-level,  to  be  in the US-ASCII character set.  All
            characters represent themselves, with the exception  of  the
            "<"   character  (ASCII  60),  which  is  used  to  begin  a
            formatting  sequence.   Formatting  sequences  consists   of
            formatting  commands  surrounded  by  angle  brackets ("<>",
            ASCII 60 and 62).  Each formatting command may  be  no  more
            than 40 characters in length. Formatting commands that begin
            with a forward slash or solidus (ASCII  47)  are  negations,
            and  such negations must always exist to balance the initial
            opening commands.  Thus, if the formatting sequence "<bold>"
            appears  at  some  point, there must later be a "</bold>" to
            balance  it.   There  are  only  two  exceptions   to   this
            "balancing"  rule:  First, the command "<lt>" may be used to
            represent a literal  "<"  character.   Second,  the  command
            "<nl>  may be used to represent a line break.  (NOTE:  These
            are intended to be mnemonic: "lt" stands  for  "less  than",
            and "nl" stands for "new line".)

            Initially defined formatting commands are:

                 Bold -- causes the subsequent text  to  be  in  a  bold
                      font.
                 Italic -- causes the subsequent text to be in an italic
                      font.
                 Fixed -- causes the subsequent text to be  in  a  fixed
                      width font.
                 Smaller -- causes  the  subsequent  text  to  be  in  a
                      smaller font.
                 Bigger -- causes the subsequent text to be in a  bigger
                      font.
                 Underline  --  causes  the  subsequent   text   to   be
                      underlined.
                 Center -- causes the subsequent text to be centered.
                 FlushLeft -- causes the  subsequent  text  to  be  left
                      justified.
                 FlushRight -- causes the subsequent text  to  be  right
                      justified.
                 Indent -- causes the subsequent text to be indented  at
                      both margins.
                 Subscript  --  causes  the  subsequent   text   to   be
                      interpreted as a subscript.
                 Superscript  --  causes  the  subsequent  text  to   be
                      interpreted as a superscript.
                 ISO-10646  --  causes  the  subsequent   text   to   be
                      interpreted  as  text  in  the ISO-10646 character
                      set.
                 ISO-8859-X  (for any registered value of X)  --  causes
                      the  subsequent  text to be interpreted as text in
                      the appropriate character set.
                 US-ASCII  --  causes  the   subsequent   text   to   be
                      interpreted as text in the US-ASCII character set.
                      Although this is the  default  character  set,  it
                      might  be usefully nested inside another character









            26              Internet Message Body Format  INTERNET DRAFT


                      set.
                 Excerpt -- causes the subsequent text to be interpreted
                      as   a   textual   excerptfrom   another  message.
                      Typically this will be displayed using indentation
                      and  an  alternate font, but such decisions are up
                      to the viewer.
                 Comment -- causes the subsequent text to be interpreted
                      as  a  comment, and hence not shown to the reader.
                      (Comments may be used,  among  other  things,  for
                      annotating  richtext  documents  with  information
                      that will be useful  upon  translation  into  some
                      richer document format.>
                 No-op -- has no effect on the subsequent text.

            Each formatting command affects all  subsequent  text  until
            the matching </token>. Such pairs of tokens must be properly
            balanced.  Thus, the proper way to  describe  text  in  bold
            italics is:

                 <bold><italic>the-text</italic></bold>

                 and,  in  particular,  the  following  is  illegal
                 richtext:

                 <bold><italic>the-text</bold></italic>

            Implementations should regard  any  unrecognized  formatting
            token  as  equivalent  to  "No-op", thus facilitating future
            extensions to "richtext".  Private  extensions  may  defined
            using  formatting tokens that begin with "X-", by analogy to
            Internet mail headers.

            Richtext also differentiates betweeen "hard" and "soft" line
            breaks.  A line break (CR LF) in the richtext data stream is
            interpreted as a "soft" line break,  one  that  is  included
            only for purposes of mail transport, and is to be treated as
            white space by richtext interpreters.  To include  a  "hard"
            line  break  (one  that  should  be  displayed as such), the
            "<nl>" formatting token should be used.

            Putting   all   this   together,   the   following    "text-
            plus/richtext"  body fragment:

                 <bold>Now</bold>     is     the      time      for
                 <italic>all</italic> good men
                  <smaller>(and   <lt>women>)</smaller>   to   come
                 <ignoreme>

                 to the aid of their
                 <nl>
                 beloved <nl><nl>country. <comment>  Stupid  quote!
                 </comment> -- the end











            INTERNET DRAFT  Internet Message Body Format              27


            represents the following  formatted  text  (which  will,  no
            doubt, look cryptic in the text-only version of this memo):

                 Now is the time for all good men (and <women>)  to
                 come to the aid of their
                 beloved

                 country. -- the end

            A  minimal  richtext  implementation  is  one  that   simply
            converts  "<lt>  to  "<",  converts CRLFs to SPACE, converts
            <nl> to a newline according  to  local  newline  convention,
            removes  everything  between  a <comment> token and the next
            following </comment> token, and removes all other formatting
            tokens (all text enclosed in angle brackets).

            NOTE ON THE RELATIONSHIP OF RICHTEXT TO SGML:   Richtext  is
            decidedly  not  SGML,  and  should  not be used to transport
            arbitrary SGML  documents.   Those  who  wish  to  use  SGML
            document  types  as  a mail transport format should define a
            new text-plus subtype,  e.g.  "text-plus/sgml-dtd-whatever".
            Richtext  is  designed  to  be  compatible  with  SGML,  and
            specifically so  that  it  will  be  possible  to  define  a
            richtext  DTD  if  that  is  desired. However, this does not
            imply that arbitrary SGML can be called richtext,  nor  that
            richtext  implementors have any need to understand SGML; the
            description  in  this  memo  is  a  complete  definition  of
            richtext.



































            28              Internet Message Body Format  INTERNET DRAFT


            One of the major goals in the design of richtext is to  make
            it  so  simple  that  even text-only mailers would implement
            richtext-to-plain-text  translators,  thus  increasing   the
            likeihood that multifont text will become "safe" to use very
            widely.  To demonstrate this simplicity, what follows  is  a
            31-line  C  program  that converts richtext input into plain
            text output:

                 #include <stdio.h>
                 #include <ctype.h>
                 main() {
                     int c, i;
                     char token[50];

                     while((c = getc(stdin)) != EOF) {
                         if (c == '<') {
                             for (i=0; (c = getc(stdin)) != '>'; ++i) {
                                 token[i] = isupper(c) ? tolower(c) : c;
                             }
                             token[i] = NULL;
                             if (!strcmp(token, "lt")) {
                                 putc('<', stdout);
                             } else if (!strcmp(token, "nl")) {
                                 putc('\n', stdout);
                             } else if (!strcmp(token, "comment")) {
                                 while (strcmp(token, "/comment")) {
                                     while ((c = getc(stdin)) != '<') ;
                                     for (i=0; (c = getc(stdin)) != '>';
                 ++i) {
                                         token[i] = isupper(c) ?
                 tolower(c) : c;
                                     }
                                     token[i] = NULL;
                                 }
                             } /* Ignore all other tokens */
                         } else if (c != '\n') {
                             putc(c, stdout);
                         }
                     }
                     putc('\n', stdout); /* for good measure */
                 }






















            INTERNET DRAFT  Internet Message Body Format              29


            5.4     The Message Content-Type

            It is frequently desirable, in sending mail, to  encapsulate
            another  mail  message. For this common operation, a special
            content-type, "message", is hereby defined.

            A content-type of "message" with no subtype  indicates  that
            the  body  or body part is an encapsulated message, with the
            syntax of an RFC 822 message, as extended by this memo.

            The special subtype "pem" may be used to indicate  that  the
            body  or  body  part  is a message conforming to the Privacy
            Enhanced Mail protocol  [RFC-1113].

            The special subtype "partial" may be used to  indicate  that
            the  body  or  body  part is a fragment of a larger message.
            Three subfields must be specified in the content-type field:
            The first is a unique identifier, as close to a world-unique
            identifier as possible,  to  be  used  to  match  the  parts
            together.   (In  general, the identifier can be similar to a
            message-id; if placed  in  double  quotes,  it  can  be  any
            message-id, in accordance with the BNF for "parameter" given
            earlier in this memo.)  The second, an integer, is the  part
            number.   The third, another integer, is the total number of
            parts. Thus, part 2 of  a  3-part  message  might  have  the
            following header field:

                 Content-type: Message/Partial;
                         "oc=jpbe0M2Yt4s@thumper.bellcore.com; 2; 3 "

            When the parts of a message broken up in this manner are put
            together,  the  result is a complete RFC-822 format message,
            which may have its own Content-type header field,  and  thus
            may contain any other data type.  (EXPLANATION:  The purpose
            of the MESSAGE/PARTIAL type is to allow large objects to  be
            delivered   as   several   separate   pieces   of  mail  and
            automatically reassembled by the receiving user agent.  This
            may  be  desirable  when intermediate transport agents limit
            the size of messages that can be sent.)

            Additionally, all the character set  subtypes  of  text  are
            defined as subtypes of "message." If a character set subtype
            is given, it applies to the bodies, though not the names, of
            each  of the encapsulated message's header fields except the
            Content-XXX header fields, which must  be  entirely  in  US-
            ASCII.  Thus it can be used to represent address and subject
            information in non-ASCII character sets.  The character  set
            subtype  does  NOT  apply  to  the  body of the encapsulated
            message.  Thus, to  encapsulate  a  message  with  non-ASCII
            characters  in  both  the header fields and in the body, you
            would need something like the following:

                 From: <ASCII form>










            30              Internet Message Body Format  INTERNET DRAFT


                 Subject:  <ASCII form>
                 Content-type:  message/iso-8859-2

                 From: <iso-8859-2-form>
                 Subject: <iso-8859-2-form>
                 Content-type: text/iso-8859-2

                 Message body in iso-8859-2 character set.























































            INTERNET DRAFT  Internet Message Body Format              31


            5.5     The Binary Content-Type

            A content-type of "binary" may be used to Indicate that  the
            body  or  body  part  is  binary  data.   A  subtype  may be
            specified, but none are defined here.   The  parameters  for
            type  binary are a set of attribute/value pairs, of the form
            "NAME=VALUE", separated by the usual semicolons.  The set of
            possible  attributes  to  be  defined  includes,  but is not
            limited to:

                 NAME -- a suggested name for the binary data as  a
                 file.

                 TYPE -- the type of binary data

                 CONVERSIONS -- the set  of  operations  that  have
                 been  performed  on  the data before putting it in
                 the mail (and before any Content-Transfer-Encoding
                 that   might   have  been  applied).  If  multiple
                 conversions  have   occurred,   they   should   be
                 specified  in  the  order  they  were applied, and
                 separated by commas.

            The values  for  these  attributes  are  left  undefined  at
            present,  but  may  require specification in the future.  An
            example of a common (though discouraged) usage might be:

                 Content-type:  binary; name=foo.tar.Z.uu; type=tar;
                         "conversions=compress,uuencode"

            However, the use of such mechanisms as uuencode and compress
            is   explicitly   discouraged,  in  favor  of  the  Content-
            Transfer-Encoding mechanism, which is both more standardized
            and more portable across mail boundaries.

            The recommended action for an implementation  that  receives
            binary  mail  of  an unrecognized type is to simply offer to
            put the data in a file, with  any  Content-Transfer-Encoding
            undone,  or  perhaps  to use it as input to a user-specified
            process.   To  reduce  the  danger  of  transmitting   rogue
            programs  through  the mail, it is strongly recommended that
            implementations  NOT  implement  a   path-search   mechanism
            whereby  an  arbitrary  program  named  in  the Content-type
            header  field  (e.g.  the  "type="  subfield  of  a   binary
            content-type)  is  found and executed using the mail body as
            input.  The recommended action for  an  implementation  that
            receives  binary  mail  of an unrecognized type is to simply
            decode any Content-Transfer-Encoding and put the data  in  a
            file for the end-user.

            Among the subtypes that  have  been  suggested  as  suitable
            subtypes   of  "binary"  are  such  document  representation
            formats as "DVI" and "ODA".










            32              Internet Message Body Format  INTERNET DRAFT


            5.6     The Application Content-Type Value

            The "application" content-type is to be used for  mail-based
            applications.   The  notion  of mail-based application is an
            application that defines a standard format for  representing
            intermediate  data  that is to be manipulated by cooperating
            user agents.  For example, a meeting scheduler might  define
            a  standard  representation  for  information about proposed
            meeting dates.  An intelligent user  agent  would  use  this
            information  to  conduct  a  dialog with the user, and might
            then send further more based on that dialog.

            Such  applications  may  be  defined  as  subtypes  of   the
            "application" content-type.  There is no default subtype for
            application, and this memo defines  only  one  subtype,  the
            "external-reference" subtype.

            The External-Reference subtype indicates that  the  body  or
            body  part is not included, presumably because too much data
            is involved for the underlying mail transport  mechanism  to
            handle.   The  subfields are, as in the case of the "binary"
            content-type, attribute-value  pairs.   In  this  case,  the
            subfields  describe  a  mechanism for accessing the external
            binary data.   The set of possible attributes includes,  but
            is not limited to:

                 FILENAME -- The name of a file that  contains  the
                 external data.

                 SITE -- one or more domain names, comma separated,
                 of  machines  that are known to have access to the
                 data file.  Asterisks may  be  used  for  wildcard
                 matching  to  a  part  of  a  domain name, such as
                 "*.bellcore.com", while a single asterisk  may  be
                 used  to  indicate  a  file that is expected to be
                 universally available,  e.g.  via  a  global  file
                 system.

                 REAL-TYPE -- The real content-type  of  the  data,
                 once retrieved.

                 EXPIRATION -- The date (in RFC  822  date  syntax)
                 after  which the existence of the external data is
                 not guaranteed.

            With  the  emerging  possibility  of  very  wide-area   file
            systems,  it becomes very hard to know in advance the set of
            machines where a  file  will  and  will  not  be  accessible
            directly  from the file system.  Therefore it makes sense to
            provide both a file name, to be tried directly, and the name
            of  one  or  more  sites  from which the file is known to be
            accessible.  An implementation can try  to  retrieve  remote
            files  using FTP or any other protocol, using anonymous file
            retrieval or prompting the user for the necessary  name  and









            INTERNET DRAFT  Internet Message Body Format              33


            password.   However, the external-reference mechanism is not
            intended to be limited to file retrieval.  One can  imagine,
            for  example,  using  a  LISTSERV mechanism, or using unique
            identifiers and a video server for  external  references  to
            video  clips. However, this memo explicitly defines only the
            FILENAME and SITE attributes for retrieval purposes, as this
            is  the  only  retrieval  method  that  is  currently widely
            applicable. Other attributes may be defined as needed.

            The "REAL-TYPE" attribute may  be  used  to  specify  a  new
            content-type  header  field  to  be applied to the data once
            retrieved, as the data are assumed to be only the body of  a
            message,  not  including  any header information.  Note that
            because of the syntax of parameters, they may be  quoted  by
            enclosing  an  entire  parameter  in  double quotes. Thus an
            external reference to an image in G3FAX  format  might  have
            the following content-type header field:

                 Content-Type: application/external-reference;
                      name=/usr/local/images/contact.g3;
                      site=thumper.bellcore.com;
                      real-type=image/g3fax;
                      expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"

            If  a  message  is  of  content-type  "application/external-
            reference", then the actual body of the message is ignored.

            The distinction between the application and binary  content-
            types  is  more  a difference of intent than of syntax.  The
            application content-type is used to indicate data  that  are
            intended  to  be  interpreted  by a mail-based user agent of
            some sort.  The binary  content-type  is  intended  for  the
            transport  of arbitrary binary data, typically data that are
            used independently of a mail  system,  and  for  which  mail
            transport  is used as a convenient alternative to other file
            and data transport mechanisms.



























            34              Internet Message Body Format  INTERNET DRAFT


            5.7     The Audio, Image, and Video Content-Type Values

            This memo defines several morecontent-type values  that  are
            defined  only incompletely here, and await further practical
            experience  before  their  values  can  be  more  completely
            specified.   These  are  clearly experimental in nature, and
            are  partially  defined   here   in   order   to   encourage
            experimenters  to  move  in a common direction, regcognizing
            that future additional standardization will be needed.

            AUDIO --   Indicates that the body  or  body  part  contains
            audio  data.  The subtype specifies the audio representation
            format.  It is expected that such subtypes will  be  defined
            by future standards.  In the meantime, vendor formats may be
            marked by subtypes such  as  "audio/x-sun",  "audio/x-next",
            and "audio/x-mac".

            IMAGE --Indicates that the body or  body  part  contains  an
            image.   The subtype names the specific image format.  A few
            such case insensitive values are  "G3Fax"  for  Group  Three
            Fax, "jpeg" for the JPEG format, and "pbm", "pgm", and "ppm"
            for the "portable bitmap" formats for black and white,  grey
            scale, or color images.

            VIDEO -- Indicates that the body or  body  part  contains  a
            time-varying-picture   image,   possibly   with   color  and
            coordinated sound.   The  term  "video"  is  used  extremely
            generically,  rather  than  with reference to any particular
            technology or format, and is not meant to preclude  subtypes
            such  as  animated  drawings encoded compactly.  The subtype
            and possible parameter values are  left  undefined  by  this
            memo.

            5.8     Experimental ("X-") Content-Type Values

            A content-type value beginning with the characters "X-"  and
            not defined here or in another RFC is a private value, to be
            used by consenting mail systems  by  mutual  agreement.  Any
            format  without  a  rigorous and public definition should be
            named with an "X-" prefix. Older versions of the widely-used
            Andrew  system  use  the "X-BE2" name, so new systems should
            probably choose a different name.





















            INTERNET DRAFT  Internet Message Body Format              35


            6       Conformance With this Memo

            The mechanisms described in this memo are open-ended.  It is
            definitely   not  expected  that  all  implementations  will
            implement all of the content-types described, nor that  they
            will  all  share  the  same extensions.  In order to promote
            interoperability,  however,  it  is  useful  to  define  the
            concept  of  "XXXX-Conformance" to define a certain level of
            implementation  that  allows  the  useful  interworking   of
            messages  with  content that differs from US ASCII text.  In
            this  section,  we  specify  the   requirements   for   such
            conformance.

            An XXXX-conformant mail user agent must:

                 1.  Recognize the Content-Transfer-Encoding header
                 field,  and  decode  data  encoded with either the
                 quoted-printable or base64 implementations.  (If a
                 compressed  encoding  is ever agreed to, it should
                 also become part of all conformant user agents.)

                 2.   Recognize  and  interpret  the   Content-type
                 header  field,  and  avoid showing an unsuspecting
                 user raw data that has a content-type field  other
                 than text.

                 3.  Explicitly handle the  following  content-type
                 values, to at least the following extents:

                 Text:
                      -- Recognize  and  display  "text"  mail
                           with the subtype "US-ASCII."
                      -- Recognize other subtypes at least  to
                           the  extent of being able to inform
                           the user about what  character  set
                           the message uses.
                      -- Recognize the "ISO-8859-1" subtype to
                           the extent of being able to display
                           those characters that are common to
                           ISO-8859-1 and US-ASCII.
                      --  Never  compose  text  mail   without
                           including  a  "Content-type" header
                           specifying the appropriate  subtype
                           (character set).
                 Text-plus:
                      -- For unrecognized  subtypes,  show  or
                           offer  to  show  the user the "raw"
                           version of the data.  An ability to
                           convert   "text-plus/richtext"   to
                           plain text is encouraged,  but  not
                           required for conformance.
                 Message:











            36              Internet Message Body Format  INTERNET DRAFT


                      --Recognize and  display  at  least  the
                           default (simple) encapsulation.
                 Multipart:
                      -- Recognize  and  display  the  default
                           (mixed)  subtype, although parallel
                           parts may be serialized.
                      -- Treat any unrecognized subtypes as if
                           they were "mixed".
                 Binary:
                      --  Offer  the  ability  to  remove  any
                           Content-Transfer-Encoding  and  put
                           the resulting information in a user
                           file.

                 4.  Upon encountering  any  unrecognized  content-
                 type,  an  implementation should treat it as if it
                 had a content-type of "binary" with  no  parameter
                 sub-arguments.   How such data is handled is up to
                 an implementation, but likely options for handling
                 such  unrecognized  data include offering the user
                 to write it into a file  (decoded  from  its  mail
                 transport  format)  or offering the user to name a
                 program to which the decoded data should be passed
                 as  input.   Unrecognized  predefined types, which
                 might include audio, image, video, or application,
                 should also be treated in this way.

            A user agent that meets the above conditions is said  to  be
            XXXX-conformant.   The  meaning of this phrase is that it is
            assumed  to  be  "safe"  to  send  virtually  any  kind   of
            properly-marked  data to users of such mail systems, because
            they  will  at  least  be  able  to  treat   the   data   as
            undifferentiated  binary, and will not simply splash it onto
            the screen of unsuspecting  users.    Of  course,  there  is
            another  sense  in  which  it is always "safe" to send XXXX-
            conformant format data, which is that it such data will  not
            break  or be broken by any known systems that are conformant
            with RFC 821 and  RFC  822.   User  agents  that  are  XXXX-
            conformant  have the additional guarantee that the user will
            not be shown data that were never intended to be  viewed  as
            text.






















            INTERNET DRAFT  Internet Message Body Format              37


            Appendix I -- Guidelines For Sending Data Via Email

            Internet email is not  yet  a  perfect,  homogenous  system.
            Mail may become corrupted at several stages in its travel to
            a final destination. Specifically, email sent throughout the
            Internet  may  travel  across  many networking technologies.
            Many networking and mail technologies  do  not  support  the
            full   functionality   possible   in   the   SMTP  transport
            environment. Mail traversing these systems is likely  to  be
            modified in such a way that it can be transported.

            There exist many widely deployed non-conformant MTA's in the
            Internet.  These  MTA's,  speaking  the SMTP protocol, alter
            messages on the fly  to  take  advantage  of  internal  data
            structure  of the hosts they are implemented on, or are just
            plain broken.

            The following guidelines may be useful to anyone devising  a
            data  format  (content-type)  that  will  survive the widest
            range of networking  technologies  and  known  broken  MTA's
            unscathed.    Note  that  anything  encoded  in  the  base64
            encoding will satisfy these rules, but that some  well-known
            mechanisms,  notably  the  UNIX uuencode facility, will not.
            Note also that  anything  encoded  in  the  Quoted-Printable
            encoding will survive most gateways intact, but possibly not
            gateways to systems that use the EBCDIC character set.

                 (1) Delimiters other than CR-LF pairs may be  used
                 in  the  local representation of a message on some
                 systems.  The persistence of  CR-LF  pairs  should
                 not be relied on.

                 (2) Isolated CR and LF  characters  are  not  well
                 tolerated   in   general;  they  may  be  lost  or
                 converted to delimiters on some systems, and hence
                 should not be relied on.

                 (3) TAB characters may be misinterpreted or may be
                 automatically  converted  to  variable  numbers of
                 spaces.  This is unavoidable in some environments,
                 notably  those  not  based  on the ASCII character
                 set. Such conversion is STRONGLY DISCOURAGED,  but
                 it  may occur, and users of US-ASCII format should
                 not rely on the persistence of TAB characters.

                 (4) Lines longer than 78 characters may be wrapped
                 or  truncated  in some environments. Line wrapping
                 and line truncation are STRONGLY DISCOURAGED,  but
                 unavoidable  in  some  cases.  Applications  which
                 depend on  lines  not  being  wrapped  should  use
                 mechanisms other than unencoded US-ASCII bodyparts
                 to transmit messages.











            38              Internet Message Body Format  INTERNET DRAFT


                 (5)  Trailing  "white  space"  characters  (SPACE,
                 TAB,  etc.)  on  a  line  may be discarded by some
                 transport agents, while other transport agents may
                 pad  lines with these characters so that all lines
                 in  a  mail  file  are  of  equal  length.     The
                 persistence  of  trailing  white space, therefore,
                 should not be relied on.

                 (6)  Many mail domains use variations on the ASCII
                 character  set,  or  use  character  sets  such as
                 EBCDIC which contain most but not all of  the  US-
                 ASCII  characters.   The  correct  translation  of
                 characters not in the "invariant"  set  cannot  be
                 depended  on across character converting gateways.
                 For example, this  situation  is  a  problem  when
                 sending  uuencoded  information  across BITNET, an
                 EBCDIC system.  Similar problems can occur without
                 crossing  a gateway, since many Internet hosts use
                 character sets other than  ASCII  internally.   In
                 particular,  the only characters that are known to
                 be consistent  across  all  gateways  are  the  62
                 characters  that correspond to the upper and lower
                 case letters A-Z and a-z, the 10 digits  0-9,  and
                 the following eleven special characters:

                                "'"  (ASCII code 39)
                                "("  (ASCII code 40)
                                ")"  (ASCII code 41)
                                "+"  (ASCII code 43)
                                ","  (ASCII code 44)
                                "-"  (ASCII code 45)
                                "."  (ASCII code 46)
                                "/"  (ASCII code 47)
                                ":"  (ASCII code 58)
                                "="  (ASCII code 61)
                                "?"  (ASCII code 63)

                 A maximally portable mail representation, such  as
                 the   base64  encoding,  will  confine  itself  to
                 relatively short lines of text in which  the  only
                 meaningful  characters  taken  from this set of 73
                 characters.

            Please note that the above list is NOT a list of recommended
            practices  for  MTA's.  RFC  821  MTA's  are prohibited from
            altering the character  of  white  space,  or  wraping  long
            lines.  These BAD and illegal practices are know to occur on
            established networks, and implementions should be robust  in
            dealing with the bad effects they can cause.














            INTERNET DRAFT  Internet Message Body Format              39


            Appendix II -- A Complex Multipart Example

            What follows is the outline of a complex multipart  message.
            This  message  has three parts to be displayed serially:  an
            introductory plain text part, an embedded multipart message,
            and  a  closing  encapsulated  text  message  in a non-ASCII
            character set.  The embedded multipart message has two parts
            to  be  displayed  in  parallel,  a  picture  and  an  audio
            fragment.

                 From: ...
                 Subject: ...
                 Content-type: multipart; 1-s; tweedledum

                 This is a multipart message.
                 Since I've not specified another character set,
                 this "prefix" area is in US ASCII.
                 --tweedledum

                 ...Some more text appears here...
                 [Note that the preceding blank line means
                 no header fields were given and this is text,
                 with charset US ASCII.]
                 --tweedledum
                 Content-type: multipart; 1-p; tweedledee

                 This is a multipart message.
                 If you are reading this text, you might want to
                 consider changing to a user agent that understands
                 how to properly display multipart messages.
                 --tweedledee
                 Content-type: x-NeXT
                 Content-Transfer-Encoding: base64

                 ... base64-encoded NeXT-format audio data goes here....
                 --tweedledee
                 Content-type: image/G3FAX
                 Content-Transfer-Encoding: Base64

                 ... base64-encoded FAX data goes here....
                 --tweedledee--
                 --tweedledum
                 Content-type: message/ISO-8859-1

                 From: (name in ISO-8859-1)
                 Subject: (subject in ISO-8859-1)
                 Content-type: Text/ISO-8859-1
                 Content-Transfer-Encoding: Quoted-printable

                 ... Closing text in ISO-8859-1 goes here ...
                 --tweedledum--












            40              Internet Message Body Format  INTERNET DRAFT


            Appendix III -- The US-ASCII Character Set

            The following table explicitly defines the default character
            set for Internet mail, "US-ASCII":


                 0 nul                   @  64 Commercial at
                 1 soh                   A  65 Latin capital letter a
                 2 stx                   B  66 Latin capital letter b
                 3 etx                   C  67 Latin capital letter c
                 4 eot                   D  68 Latin capital letter d
                 5 enq                   E  69 Latin capital letter e
                 6 ack                   F  70 Latin capital letter f
                 7 bel                   G  71 Latin capital letter g
                 8 bs                    H  72 Latin capital letter h
                 9 ht                    I  73 Latin capital letter i
                10 lf                    J  74 Latin capital letter j
                11 vt                    K  75 Latin capital letter k
                12 np                    L  76 Latin capital letter l
                13 cr                    M  77 Latin capital letter m
                14 so                    N  78 Latin capital letter n
                15 si                    O  79 Latin capital letter o
                16 dle                   P  80 Latin capital letter p
                17 dc1                   Q  81 Latin capital letter q
                18 dc2                   R  82 Latin capital letter r
                19 dc3                   S  83 Latin capital letter s
                20 dc4                   T  84 Latin capital letter t
                21 nak                   U  85 Latin capital letter u
                22 syn                   V  86 Latin capital letter v
                23 etb                   W  87 Latin capital letter w
                24 can                   X  88 Latin capital letter x
                25 em                    Y  89 Latin capital letter y
                26 sub                   Z  90 Latin capital letter z
                27 esc                   [  91 Left square bracket
                28 fs                    \  92 Reverse solidus
                29 gs                    ]  93 Right square bracket
                30 rs                    ^  94 Circumflex accent
                31 us                    _  95 Low line
                32 Space                 `  96 Grave accent
             !  33 Exclamation mark      a  97 Latin small letter a
             "  34 Quotation mark        b  98 Latin small letter b
             #  35 Number sign           c  99 Latin small letter c
             $  36 Dollar sign           d 100 Latin small letter d
             %  37 Percent sign          e 101 Latin small letter e
             &  38 Ampersand             f 102 Latin small letter f
             '  39 Apostrophe            g 103 Latin small letter g
             (  40 Left parenthesis      h 104 Latin small letter h
             )  41 Right parenthesis     i 105 Latin small letter i
             *  42 Asterisk              j 106 Latin small letter j
             +  43 Plus sign             k 107 Latin small letter k
             ,  44 Comma                 l 108 Latin small letter l
             -  45 Hyphen, minus sign    m 109 Latin small letter m











            INTERNET DRAFT  Internet Message Body Format              41


             .  46 Full stop             n 110 Latin small letter n
             /  47 Solidus               o 111 Latin small letter o
             0  48 Digit zero            p 112 Latin small letter p
             1  49 Digit one             q 113 Latin small letter q
             2  50 Digit two             r 114 Latin small letter r
             3  51 Digit three           s 115 Latin small letter s
             4  52 Digit four            t 116 Latin small letter t
             5  53 Digit five            u 117 Latin small letter u
             6  54 Digit six             v 118 Latin small letter v
             7  55 Digit seven           w 119 Latin small letter w
             8  56 Digit eight           x 120 Latin small letter x
             9  57 Digit nine            y 121 Latin small letter y
             :  58 Colon                 z 122 Latin small letter z
             ;  59 Semicolon             { 123 Left curly bracket
             <  60 Less-than sign        | 124 Vertical line
             =  61 Equals sign           } 125 Right curly bracket
             >  62 Greater-than sign     ~ 126 Tilde
             ?  63 Question mark           127 Del













































            42              Internet Message Body Format  INTERNET DRAFT




            Summary

            Using the Content-Type and Content-Transfer-Encoding  header
            fields,  it  is  possible to include, in a standardized way,
            arbitrary types of data objects with RFC 822 conformant mail
            messages.  No  restrictions imposed by either RFC 821 or RFC
            822 are broken, and care has been taken  to  avoid  problems
            caused   by   additional   restrictions   imposed   by   the
            characteristics of some Internet mail  transport  mechanisms
            (see  Appendix  I).  The  "multipart" and "message" content-
            types allow mixing and heirarchical structuring  of  objects
            of different types in a single message. Further content-tyes
            allow a  standardized  mechanism  for  tagging  messages  or
            mesage  parts  as  audio,  image,  or several other kinds of
            data.    Additional   optional   header    fields    provide
            conventional   mechanisms   for  certain  extensions  deemed
            desirable by many implementors.  Finally, a number of useful
            content-types are defined for general use by consenting user
            agents.

            Contacts

            For more information, the authors of this  document  may  be
            contacted via Internet mail:

                  Nathaniel Borenstein <nsb@thumper.bellcore.com>
                            Ned Freed <ned@innosoft.com>

            Acknowledgements

            This memo is the result of the collective effort of a  large
            number  of people, at several IETF meetings and on the IETF-
            SMTP and IETF-822 mailing lists.  Although  any  enumeration
            seems   doomed  to  suffer  from  egregious  omissions,  the
            following are among the many contributors  to  this  effort:
            Harald  Alvestrand,  Randall  Atkinson,  Kevin Carosso, Mark
            Crispin, Dave Crocker, Terry Crowley,  Walt  Daniels,  Frank
            Dawson,  Hitoshi Doi, Kevin Donnelly, Johnny Eriksson, Craig
            Everhart, Roger Fajman, Alain  Fontaine,  Philip  Gladstone,
            Thomas Gordon, Phill Gross, David Herron, Bruce Howard, Bill
            Janssen, Risto Kankkunen, Phil Karn, Tim Kehres, Neil Katin,
            Steve  Kille, Anders Klemets, John Klensin, Valdis Kletniek,
            Stev Knowles, Bob Kummerfeld, Vincent  Lau,  Timo  Lehtinen,
            John   MacMillan,   Rick   McGowan,   Leo  Mclaughlin,  Goli
            Montaser-Kohsari,  Keith   Moore,   Mark   Needleman,   John
            Noerenberg,   Mats   Ohrman,   David   J.  Pepper,  Jonathan
            Rosenberg, Jan Rynning, Mark  Sherman,  Keld  Simonsen,  Bob
            Smart, Einar Stefferud, Michael Stein, Peter Svanberg, Steve
            Uhler, Stuart Vance, Erik van der  Poel,  Peter  Vanderbilt,
            Greg  Vaudreuil,  Brian  Wideen,  Glenn  Wright,  and  David
            Zimmerman.  The authors apologize  for  any  omissions  from
            this list, which were certainly unintentional.









            INTERNET DRAFT  Internet Message Body Format              43


            References

            [REF-ISO646]       International       Standard--Information
            Processing--ISO  7-bit  coded  character set for information
            interchange, ISO 646:1983.

            [REF-ISO-2022]      International      Standard--Information
            Processing--ISO  7-bit and  8-bit coded character sets--Code
            extension techniques, ISO 2022:1986.

            [REF-ANSI] Coded Character Set--7-Bit American Standard Code
            for  Information Interchange, ANSI X3.4-1986.

            [REF-X400]  Schicker,  Pietro,  "Message  Handling  Systems,
            X.400",    Message    Handling   Systems   and   Distributed
            Applications, E. Stefferud, O-j. Jacobsen, and P.  Schicker,
            eds., North-Holland, 1989, pp. 3-41.

            [RFC-821]  Postel,  J.B.   Simple  Mail  Transfer  Protocol.
            August, 1982, Network Information Center, RFC-821.

            [RFC-822]   Crocker, D.  Standard for  the  format  of  ARPA
            Internet  text  messages.  August, 1982, Network Information
            Center, RFC-822.

            [RFC-934]   Rose, M.T.; Stefferud, E.A.   Proposed  standard
            for   message    encapsulation.   January,   1985,   Network
            Information Center, RFC-934.

            [RFC-1049]   Sirbu,  M.A.   Content-type  header  field  for
            Internet messages.  March, 1988, Network Information Center,
            RFC-1049.

            [RFC-1113]   Linn,  J.   Privacy  enhancement  for  Internet
            electronic   mail:   Part   I  -  message  encipherment  and
            authentication procedures [Draft].   August,  1989,  Network
            Information Center, RFC-1113.

            [RFC-1154]  Robinson, D.; Ullmann, R.  Encoding header field
            for  internet  messages.  April,  1990,  Network Information
            Center, RFC-1154.

            [REF-RFC-QR]   Prindeville,   Phillipe-Andrew',   and   Keld
            Simonsen,  "A  Portable,  Extensible Message Encoding Format
            for Alphabetic Scripts", Internet RFC, in preparation.

            [REF-ISO-10646] Draft International Standard --  Information
            Technology  --  Universal Coded Character Set (UCS), ISO/IEC
            DIS 10646:1990.

            [REF-ISO-8859] **********












            44              Internet Message Body Format  INTERNET DRAFT




                               Table of Contents


            1   Introduction.........................................  3
            2   The Content-Type Header Field........................  5
            3   The Content-Transfer-Encoding Header Field...........  8
            3.1 Quoted-Printable Content-Transfer-Encoding........... 10
            3.2 Base64 Content-Transfer-Encoding..................... 12
            4   Additional Optional Content- Header Fields........... 16
            4.1 Optional Content-ID Header Field..................... 16
            4.2 Optional Content-Description Header Field............ 16
            5   The Predefined Content-type Values................... 17
            5.1 The TEXT Content-type and the US-ASCII Character Set. 17
            20  The .................................................
            24  The .................................................
            5.4 The Message Content-Type............................. 29
            5.5 The Binary Content-Type.............................. 31
            5.6 The Application Content-Type Value................... 32
            5.7 The Audio, Image, and Video Content-Type Values...... 34
            34  Experimental (.......................................
            6   Conformance With this Memo........................... 35
                Appendix I -- Guidelines For Sending Data Via Email.. 37
                Appendix II -- A Complex Multipart Example........... 39
                Appendix III -- The US-ASCII Character Set........... 40
                Summary.............................................. 42
                Contacts............................................. 42
                Acknowledgements..................................... 42
                References........................................... 43