[comp.mail.mh] repl wraps and indents headers incorrectly, I think

garyo@THINK.COM (Gary Oberbrunner) (09/07/89)

When wrapping long headers with repl(1), it seems to use the length of the
field name as the indent width rather than a fixed number of spaces or a
TAB.  I believe (although I don't have the RFC822 spec here) that indented
header lines are always supposed to be indented with a TAB, or else they
get treated as the beginning of the message.  Repl's indenting breaks my
reply mail to people with long return (or From) addresses, such as the
following message (enclosed within lines of ==='s):

============================================================================
Date: Wed, 6 Sep 89 13:48:25 EDT
From: Bob Doolittle ({gatech,uunet,petsd}!masscomp!rad) <rad@westford.ccur.com>
To: garyo
Subject: this is a test

------------------
This is a test message to see how reply formatting works.
============================================================================

Repl turns this message into an outgoing header like this:

============================================================================
To: Bob Doolittle ({gatech,uunet,
    petsd}!masscomp!rad) <rad@westford.ccur.com>
Fcc: ccs
Subject: Re: this is a test
In-reply-to: Your message of Wed, 06 Sep 89 13:48:25 -0400.
--------
============================================================================

From the source code, it looks like this behavior is hardwired into
fmtscan() (in uip/sbr/formatsbr.c).  And fmtscan() is always called from
Replout() (in uip/replsbr.c), regardless of the -format switch.  So I don't
know what I can do, short of an awk script that parses the headers and
turns any leading spaces into a TAB.  Any suggestions?  Perhaps (I don't
know enough about this to really tell) this is a sendmail config-file issue
instead?  Any help would be appreciated.

Here's my .mh_profile, for completeness:
============================================================================
Path: Mail
Editor: vi
Signature: Gary Oberbrunner
Alternate-Mailboxes: *garyo*,staff@*,*!staff
Send: -verbose -alias aliases
showproc: mhl
Repl: -nocc me -fcc ccs -annotate
Msg-Protect: 0600
Folder-Protect: 0744
Draft-Folder: /u8/garyo/Mail/drafts
Sequence-Negation: ^
whom: -alias aliases
ali: -alias aliases
Unseen-Sequence: unseen
============================================================================

					Thanks,

					Gary Oberbrunner
					garyo@think.com
					{ames,harvard}!think!garyo

karlton@fudge.sgi.com (Phil Karlton) (09/07/89)

In article <8909062110.AA00932@prometheus.think.com> garyo@THINK.COM (Gary Oberbrunner) writes:
>When wrapping long headers with repl(1), it seems to use the length of the
>field name as the indent width rather than a fixed number of spaces or a
>TAB.  I believe (although I don't have the RFC822 spec here) that indented
>header lines are always supposed to be indented with a TAB, or else they
>get treated as the beginning of the message.

From RFC822, August 13, 1982, page 5:

    ... can be split into a multiple line representation; this
    is called "folding". The general rule is that wherever there
    may be linear-white-space (NOT simple LWSP-chars), a CRLF
    immediately followed by AT LEAST one LWSP-char may instead
    be inserted.

In other words, it doesn't have to indented with a TAB.

PK
--
Phil Karlton                            karlton@sgi.com
Silicon Graphics Computer Systems       415-964-1459, ext. 3018
2011 N. Shoreline Blvd.

mdb@ESD.3Com.COM (Mark D. Baushke) (09/07/89)

On 6 Sep 89 21:10:38 GMT, garyo@THINK.COM (Gary Oberbrunner) said:

Gary> When wrapping long headers with repl(1), it seems to use the
Gary> length of the field name as the indent width rather than a fixed
Gary> number of spaces or a TAB.  I believe (although I don't have the
Gary> RFC822 spec here) that indented header lines are always supposed
Gary> to be indented with a TAB, or else they get treated as the
Gary> beginning of the message.

I think your problem can be viewed as asking the following questions:

	Q1) Is it legal to continue a field with either a SPACE or a
	   TAB [HTAB in RFC822 language]?

	A1) The answer is yes. The continuation should be an LWSP-char.

	Q2) Is it legal to split a parenthetical comment accross
	   continuation lines?

	A2) Yes. A parenthetical comment may contain linear-white-space.

	Q3) If repl is generating a legal address, is this a bug in
	    the MH address parsing code?

	A3) In my opinion, yes you have found a bug in MH.

RFC 822, page 10, 12-13, 15

     3.3.  LEXICAL TOKENS

                                                 ; (  Octal, Decimal.)
     CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)
     CR          =  <ASCII CR, carriage return>  ; (     15,      13.)
     LF          =  <ASCII LF, linefeed>         ; (     12,      10.)
     CRLF        =  CR LF
     SPACE       =  <ASCII SP, space>            ; (     40,      32.)
     HTAB        =  <ASCII HT, horizontal-tab>   ; (     11,       9.)
     LWSP-char   =  SPACE / HTAB                 ; semantics = SPACE

     linear-white-space =  1*([CRLF] LWSP-char)  ; semantics = SPACE

     comment     =  "(" *(ctext / quoted-pair / comment) ")"

     ctext       =  <any CHAR excluding "(",     ; => may be folded
                     ")", "\" & CR, & including
                     linear-white-space>

     quoted-pair =  "\" CHAR                     ; may quote any char

     [...]

     3.4.2.  WHITE SPACE

        Note:  In structured field bodies, multiple linear space ASCII
               characters  (namely  HTABs  and  SPACEs) are treated as
               single spaces and may freely surround any  symbol.   In
               all header fields, the only place in which at least one
               LWSP-char is REQUIRED is at the beginning of  continua-
               tion lines in a folded field.

        When passing text to processes  that  do  not  interpret  text
        according to this standard (e.g., mail protocol servers), then
        NO linear-white-space characters should occur between a period
        (".") or at-sign ("@") and a <word>.  Exactly ONE SPACE should
        be used in place of arbitrary linear-white-space  and  comment
        sequences.

        Note:  Within systems conforming to this standard, wherever  a
               member of the list of delimiters is allowed, LWSP-chars
               may also occur before and/or after it.

        Writers of  mail-sending  (i.e.,  header-generating)  programs
        should realize that there is no network-wide definition of the
        effect of ASCII HT (horizontal-tab) characters on the  appear-
        ance  of  text  at another network host; therefore, the use of
        tabs in message headers, though permitted, is discouraged.

     3.4.3.  COMMENTS

        A comment is a set of ASCII characters, which is  enclosed  in
        matching  parentheses  and which is not within a quoted-string
        The comment construct permits message originators to add  text
        which  will  be  useful  for  human readers, but which will be
        ignored by the formal semantics.  Comments should be  retained
        while  the  message  is subject to interpretation according to
        this standard.  However, comments  must  NOT  be  included  in
        other  cases,  such  as  during  protocol  exchanges with mail
        servers.

        Comments nest, so that if an unquoted left parenthesis  occurs
        in  a  comment  string,  there  must  also be a matching right
        parenthesis.  When a comment acts as the delimiter  between  a
        sequence of two lexical symbols, such as two atoms, it is lex-
        ically equivalent with a single SPACE,  for  the  purposes  of
        regenerating  the  sequence, such as when passing the sequence
        onto a mail protocol server.  Comments are  detected  as  such
        only within field-bodies of structured fields.

        If a comment is to be "folded" onto multiple lines,  then  the
        syntax  for  folding  must  be  adhered to.  (See the "Lexical
        Analysis of Messages" section on "Folding Long Header  Fields"
        above,  and  the  section on "Case Independence" below.)  Note
        that  the  official  semantics  therefore  do  not  "see"  any
        unquoted CRLFs that are in comments, although particular pars-
        ing programs may wish to note their presence.  For these  pro-
        grams,  it would be reasonable to interpret a "CRLF LWSP-char"
        as being a CRLF that is part of the comment; i.e., the CRLF is
        kept  and  the  LWSP-char is discarded.  Quoted CRLFs (i.e., a
        backslash followed by a CR followed by a  LF)  still  must  be
        followed by at least one LWSP-char.

     [...]

     3.4.8.  FOLDING LONG HEADER FIELDS

        Each header field may be represented on exactly one line  con-
        sisting  of the name of the field and its body, and terminated
        by a CRLF; this is what the parser sees.  For readability, the
        field-body  portion of long header fields may be "folded" onto
        multiple lines of the actual field.  "Long" is commonly inter-
        preted  to  mean greater than 65 or 72 characters.  The former
        length serves as a limit, when the message is to be viewed  on
        most  simple terminals which use simple display software; how-
        ever, the limit is not imposed by this standard.

        Note:  Some display software often can selectively fold lines,
               to  suit  the display terminal.  In such cases, sender-
               provided  folding  can  interfere  with   the   display
               software.

RFC 822, page 40

     B.2.  SEMANTICS

          Headers occur before the message body and are terminated  by
     a null line (i.e., two contiguous CRLFs).

          A line which continues a header field begins with a SPACE or
     HTAB  character,  while  a  line  beginning a field starts with a
     printable character which is not a colon.

          A field-name consists of one or  more  printable  characters
     (excluding  colon,  space, and control-characters).  A field-name
     MUST be contained on one line.  Upper and lower case are not dis-
     tinguished when comparing field-names.

Gary> Repl's indenting breaks my reply mail to people with long return
Gary> (or From) addresses, such as the following message (enclosed
Gary> within lines of ==='s):

Gary> =========================================================================
Gary> Date: Wed, 6 Sep 89 13:48:25 EDT
Gary> From: Bob Doolittle ({gatech,uunet,petsd}!masscomp!rad) <rad@westford.ccur.com>
Gary> To: garyo
Gary> Subject: this is a test

Gary> ------------------
Gary> This is a test message to see how reply formatting works.
Gary> =========================================================================

Gary> Repl turns this message into an outgoing header like this:

Gary> =========================================================================
Gary> To: Bob Doolittle ({gatech,uunet,
Gary>     petsd}!masscomp!rad) <rad@westford.ccur.com>
Gary> Fcc: ccs
Gary> Subject: Re: this is a test
Gary> In-reply-to: Your message of Wed, 06 Sep 89 13:48:25 -0400.
Gary> --------
Gary> =========================================================================

As near as I can tell, this is a legal continuation per strict RFC 822.

I just tested sendmail directly, I do not seem to have any problem.

Of course, post(8) is unable find any addressees.

It looks to me like you found a bug in MH (I am running 6.6).

Gary> From the source code, it looks like this behavior is hardwired
Gary> into fmtscan() (in uip/sbr/formatsbr.c).  And fmtscan() is
Gary> always called from Replout() (in uip/replsbr.c), regardless of
Gary> the -format switch.  So I don't know what I can do, short of an
Gary> awk script that parses the headers and turns any leading spaces
Gary> into a TAB.  Any suggestions?  Perhaps (I don't know enough
Gary> about this to really tell) this is a sendmail config-file issue
Gary> instead?  Any help would be appreciated.

Well, you could look into fixing uip/post.c ...
--
Mark D. Baushke
Internet:   mdb@ESD.3Com.COM
UUCP:	    {3comvax,auspex,sun}!bridge2!mdb