garyo@THINK.COM (Gary Oberbrunner) (09/07/89)
When wrapping long headers with repl(1), it seems to use the length of the
field name as the indent width rather than a fixed number of spaces or a
TAB. I believe (although I don't have the RFC822 spec here) that indented
header lines are always supposed to be indented with a TAB, or else they
get treated as the beginning of the message. Repl's indenting breaks my
reply mail to people with long return (or From) addresses, such as the
following message (enclosed within lines of ==='s):
============================================================================
Date: Wed, 6 Sep 89 13:48:25 EDT
From: Bob Doolittle ({gatech,uunet,petsd}!masscomp!rad) <rad@westford.ccur.com>
To: garyo
Subject: this is a test
------------------
This is a test message to see how reply formatting works.
============================================================================
Repl turns this message into an outgoing header like this:
============================================================================
To: Bob Doolittle ({gatech,uunet,
petsd}!masscomp!rad) <rad@westford.ccur.com>
Fcc: ccs
Subject: Re: this is a test
In-reply-to: Your message of Wed, 06 Sep 89 13:48:25 -0400.
--------
============================================================================
From the source code, it looks like this behavior is hardwired into
fmtscan() (in uip/sbr/formatsbr.c). And fmtscan() is always called from
Replout() (in uip/replsbr.c), regardless of the -format switch. So I don't
know what I can do, short of an awk script that parses the headers and
turns any leading spaces into a TAB. Any suggestions? Perhaps (I don't
know enough about this to really tell) this is a sendmail config-file issue
instead? Any help would be appreciated.
Here's my .mh_profile, for completeness:
============================================================================
Path: Mail
Editor: vi
Signature: Gary Oberbrunner
Alternate-Mailboxes: *garyo*,staff@*,*!staff
Send: -verbose -alias aliases
showproc: mhl
Repl: -nocc me -fcc ccs -annotate
Msg-Protect: 0600
Folder-Protect: 0744
Draft-Folder: /u8/garyo/Mail/drafts
Sequence-Negation: ^
whom: -alias aliases
ali: -alias aliases
Unseen-Sequence: unseen
============================================================================
Thanks,
Gary Oberbrunner
garyo@think.com
{ames,harvard}!think!garyokarlton@fudge.sgi.com (Phil Karlton) (09/07/89)
In article <8909062110.AA00932@prometheus.think.com> garyo@THINK.COM (Gary Oberbrunner) writes: >When wrapping long headers with repl(1), it seems to use the length of the >field name as the indent width rather than a fixed number of spaces or a >TAB. I believe (although I don't have the RFC822 spec here) that indented >header lines are always supposed to be indented with a TAB, or else they >get treated as the beginning of the message. From RFC822, August 13, 1982, page 5: ... can be split into a multiple line representation; this is called "folding". The general rule is that wherever there may be linear-white-space (NOT simple LWSP-chars), a CRLF immediately followed by AT LEAST one LWSP-char may instead be inserted. In other words, it doesn't have to indented with a TAB. PK -- Phil Karlton karlton@sgi.com Silicon Graphics Computer Systems 415-964-1459, ext. 3018 2011 N. Shoreline Blvd.
mdb@ESD.3Com.COM (Mark D. Baushke) (09/07/89)
On 6 Sep 89 21:10:38 GMT, garyo@THINK.COM (Gary Oberbrunner) said:
Gary> When wrapping long headers with repl(1), it seems to use the
Gary> length of the field name as the indent width rather than a fixed
Gary> number of spaces or a TAB. I believe (although I don't have the
Gary> RFC822 spec here) that indented header lines are always supposed
Gary> to be indented with a TAB, or else they get treated as the
Gary> beginning of the message.
I think your problem can be viewed as asking the following questions:
Q1) Is it legal to continue a field with either a SPACE or a
TAB [HTAB in RFC822 language]?
A1) The answer is yes. The continuation should be an LWSP-char.
Q2) Is it legal to split a parenthetical comment accross
continuation lines?
A2) Yes. A parenthetical comment may contain linear-white-space.
Q3) If repl is generating a legal address, is this a bug in
the MH address parsing code?
A3) In my opinion, yes you have found a bug in MH.
RFC 822, page 10, 12-13, 15
3.3. LEXICAL TOKENS
; ( Octal, Decimal.)
CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
CR = <ASCII CR, carriage return> ; ( 15, 13.)
LF = <ASCII LF, linefeed> ; ( 12, 10.)
CRLF = CR LF
SPACE = <ASCII SP, space> ; ( 40, 32.)
HTAB = <ASCII HT, horizontal-tab> ; ( 11, 9.)
LWSP-char = SPACE / HTAB ; semantics = SPACE
linear-white-space = 1*([CRLF] LWSP-char) ; semantics = SPACE
comment = "(" *(ctext / quoted-pair / comment) ")"
ctext = <any CHAR excluding "(", ; => may be folded
")", "\" & CR, & including
linear-white-space>
quoted-pair = "\" CHAR ; may quote any char
[...]
3.4.2. WHITE SPACE
Note: In structured field bodies, multiple linear space ASCII
characters (namely HTABs and SPACEs) are treated as
single spaces and may freely surround any symbol. In
all header fields, the only place in which at least one
LWSP-char is REQUIRED is at the beginning of continua-
tion lines in a folded field.
When passing text to processes that do not interpret text
according to this standard (e.g., mail protocol servers), then
NO linear-white-space characters should occur between a period
(".") or at-sign ("@") and a <word>. Exactly ONE SPACE should
be used in place of arbitrary linear-white-space and comment
sequences.
Note: Within systems conforming to this standard, wherever a
member of the list of delimiters is allowed, LWSP-chars
may also occur before and/or after it.
Writers of mail-sending (i.e., header-generating) programs
should realize that there is no network-wide definition of the
effect of ASCII HT (horizontal-tab) characters on the appear-
ance of text at another network host; therefore, the use of
tabs in message headers, though permitted, is discouraged.
3.4.3. COMMENTS
A comment is a set of ASCII characters, which is enclosed in
matching parentheses and which is not within a quoted-string
The comment construct permits message originators to add text
which will be useful for human readers, but which will be
ignored by the formal semantics. Comments should be retained
while the message is subject to interpretation according to
this standard. However, comments must NOT be included in
other cases, such as during protocol exchanges with mail
servers.
Comments nest, so that if an unquoted left parenthesis occurs
in a comment string, there must also be a matching right
parenthesis. When a comment acts as the delimiter between a
sequence of two lexical symbols, such as two atoms, it is lex-
ically equivalent with a single SPACE, for the purposes of
regenerating the sequence, such as when passing the sequence
onto a mail protocol server. Comments are detected as such
only within field-bodies of structured fields.
If a comment is to be "folded" onto multiple lines, then the
syntax for folding must be adhered to. (See the "Lexical
Analysis of Messages" section on "Folding Long Header Fields"
above, and the section on "Case Independence" below.) Note
that the official semantics therefore do not "see" any
unquoted CRLFs that are in comments, although particular pars-
ing programs may wish to note their presence. For these pro-
grams, it would be reasonable to interpret a "CRLF LWSP-char"
as being a CRLF that is part of the comment; i.e., the CRLF is
kept and the LWSP-char is discarded. Quoted CRLFs (i.e., a
backslash followed by a CR followed by a LF) still must be
followed by at least one LWSP-char.
[...]
3.4.8. FOLDING LONG HEADER FIELDS
Each header field may be represented on exactly one line con-
sisting of the name of the field and its body, and terminated
by a CRLF; this is what the parser sees. For readability, the
field-body portion of long header fields may be "folded" onto
multiple lines of the actual field. "Long" is commonly inter-
preted to mean greater than 65 or 72 characters. The former
length serves as a limit, when the message is to be viewed on
most simple terminals which use simple display software; how-
ever, the limit is not imposed by this standard.
Note: Some display software often can selectively fold lines,
to suit the display terminal. In such cases, sender-
provided folding can interfere with the display
software.
RFC 822, page 40
B.2. SEMANTICS
Headers occur before the message body and are terminated by
a null line (i.e., two contiguous CRLFs).
A line which continues a header field begins with a SPACE or
HTAB character, while a line beginning a field starts with a
printable character which is not a colon.
A field-name consists of one or more printable characters
(excluding colon, space, and control-characters). A field-name
MUST be contained on one line. Upper and lower case are not dis-
tinguished when comparing field-names.
Gary> Repl's indenting breaks my reply mail to people with long return
Gary> (or From) addresses, such as the following message (enclosed
Gary> within lines of ==='s):
Gary> =========================================================================
Gary> Date: Wed, 6 Sep 89 13:48:25 EDT
Gary> From: Bob Doolittle ({gatech,uunet,petsd}!masscomp!rad) <rad@westford.ccur.com>
Gary> To: garyo
Gary> Subject: this is a test
Gary> ------------------
Gary> This is a test message to see how reply formatting works.
Gary> =========================================================================
Gary> Repl turns this message into an outgoing header like this:
Gary> =========================================================================
Gary> To: Bob Doolittle ({gatech,uunet,
Gary> petsd}!masscomp!rad) <rad@westford.ccur.com>
Gary> Fcc: ccs
Gary> Subject: Re: this is a test
Gary> In-reply-to: Your message of Wed, 06 Sep 89 13:48:25 -0400.
Gary> --------
Gary> =========================================================================
As near as I can tell, this is a legal continuation per strict RFC 822.
I just tested sendmail directly, I do not seem to have any problem.
Of course, post(8) is unable find any addressees.
It looks to me like you found a bug in MH (I am running 6.6).
Gary> From the source code, it looks like this behavior is hardwired
Gary> into fmtscan() (in uip/sbr/formatsbr.c). And fmtscan() is
Gary> always called from Replout() (in uip/replsbr.c), regardless of
Gary> the -format switch. So I don't know what I can do, short of an
Gary> awk script that parses the headers and turns any leading spaces
Gary> into a TAB. Any suggestions? Perhaps (I don't know enough
Gary> about this to really tell) this is a sendmail config-file issue
Gary> instead? Any help would be appreciated.
Well, you could look into fixing uip/post.c ...
--
Mark D. Baushke
Internet: mdb@ESD.3Com.COM
UUCP: {3comvax,auspex,sun}!bridge2!mdb