[comp.mail.misc] binmail vs MMDF mail file format

wisner@mica.Berkeley.EDU (Bill Wisner) (07/05/89)

>                          But the simple fact that ^A is not a printable
>character means that mail MTAs are going to have a major problem in
>mailing that message to someone.

You don't mail \001s, you mail messages. The sequence \001\001\001\001 is
only used in saved mail files, not when sending messages.

\001\001\001\001 is unambiguous. It is a string that is not likely to
appear in a mail message. On the other hand, I see "^From " in messages
quite frequently. Well, actually, it's "^>From " that I see frequently.

UNIX's stoopid mail format insures that no line which starts with the word
From will survive intact. This mangling of mail is completely unwarranted.
This sorry excuse for a mailbox format should have been retired years ago.

Bill Wisner		wisner@mica.berkeley.edu	     ucbvax!mica!wisner
I'm not the NRA either.

nelson@sun.soe.clarkson.edu (Russ Nelson) (07/05/89)

In article <WISNER.89Jul4170048@anableps.berkeley.edu> wisner@mica.Berkeley.EDU (Bill Wisner) writes:

   \001\001\001\001 is unambiguous. It is a string that is not likely to
   appear in a mail message. On the other hand, I see "^From " in messages
   quite frequently. Well, actually, it's "^>From " that I see frequently.

You only think you have a problem with binmail format.  Actually you
have a problem with programs that forget that "From " in the body of
messages has been quoted.  Even /usr/ucb/mail forgets to make a
distinction between outputting a message to a mailbox and a file.  The
same command serves for both purposes, creating your perfectly valid
objection.

I may have a problem with MMDF format if it assumes that you never send

in your mail message(s).  In fact, sending

is not a totally crazy thing to do.  I just did it twice, and since things
always come in threes, here's a third:

Of course, you people reading this via news undoubtedly had no problem with
the ^A lines.  Anyone who had this message mailed to them via MMDF may
not see this.
--
--russ (nelson@clutx [.bitnet | .clarkson.edu])
Democracy needs capitalism like a fish needs a bicycle.

wisner@mica.Berkeley.EDU (Bill Wisner) (07/05/89)

The person I never expected to see posting with a valid address, Russ
Nelson, writes:

>I may have a problem with MMDF format if it assumes that you never send
>
>in your mail message(s).  In fact, sending
>
>is not a totally crazy thing to do.  I just did it twice, and since things
>always come in threes, here's a third:
>
>Of course, you people reading this via news undoubtedly had no problem with
>the ^A lines.  Anyone who had this message mailed to them via MMDF may
>not see this.

Wrong. Totally, utterly wrong. USENET and Internet mail are both equally
incapable of handling control characters. Those three lines, on which you
ostensibly types \001s, are absolutely empty.

So, you see, sending \001\001\001\001 *is* a totally crazy thing to do.
Neither news nor mail are required to handle control characters according
to the rules, so they don't.


Bill Wisner		wisner@mica.berkeley.edu	     ucbvax!mica!wisner
I'm not the NRA either.

david@ms.uky.edu (David Herron -- One of the vertebrae) (07/05/89)

In article <NELSON.89Jul4220946@sun.soe.clarkson.edu>, nelson@sun.soe.clarkson.edu (Russ Nelson) writes:
> In article <WISNER.89Jul4170048@anableps.berkeley.edu> wisner@mica.Berkeley.EDU (Bill Wisner) writes:
>    \001\001\001\001 is unambiguous. It is a string that is not likely to
>    appear in a mail message. On the other hand, I see "^From " in messages
>    quite frequently. Well, actually, it's "^>From " that I see frequently.
> 
> You only think you have a problem with binmail format.
> 
> I may have a problem with MMDF format if it assumes that you never send
> 
Hmm...  that line didn't have any ^A's in it by the time it reached meee..
And I read this via news.

> Of course, you people reading this via news undoubtedly had no problem with
> the ^A lines.  Anyone who had this message mailed to them via MMDF may
> not see this.


Considering that a line of 4 ^A's in the middle of a message is very
much more unlikely than a line beginning with From<space>, I have
little problem with MMDF's mailbox format.

A problem with both the ucbmail and mush ports into the MMDF
environment is that neither makes a distinction between saving
a message to a mailbox and sending it out a pipe.  In both cases
the ^A's are sent out...  It's on my list of things to do ;-)

I see two solutions...

One is a directory hierarchy line MH.  It seems fast enough, but the
programs don't always handle >100 mailfolders very well.  See, I
keep a lot of mail around.

Another is a database of some sort.

"Most newsreaders are better than most mail readers."  Is that
how the line goes?
-- 
<- David Herron; an MMDF guy                              <david@ms.uky.edu>
<- ska: David le casse\*'      {rutgers,uunet}!ukma!david, david@UKMA.BITNET
<-
<- New word for the day: Obnoxity -- an act of obnoxiousness

lear@NET.BIO.NET (Eliot Lear) (07/05/89)

Actually, strictly speaking, USENET software can handle a number of
control characters, like ESCAPE.  Don't you ever see those messages
with vt100 underline codes?

Aside from that, I agree with you about the `minimalist' mail format.

There are a number of techniques that could be and have been used to
improve matters, the most common of which seems to be that of simply
prepending a byte count to the message so that you know exactly where
it ends.  This prevents the needless butchering of innocent Froms
(there! I got in some religion!).
-- 
Eliot Lear
[lear@net.bio.net]

schaefer@ogccse.ogc.edu (Barton E. Schaefer) (07/05/89)

Sorry, my inews won't let me post to alt.*.

In article <12057@s.ms.uky.edu> david@ms.uky.edu (David Herron -- One of the vertebrae) writes:
} A problem with both the ucbmail and mush ports into the MMDF
} environment is that neither makes a distinction between saving
} a message to a mailbox and sending it out a pipe.  In both cases
} the ^A's are sent out...  It's on my list of things to do ;-)

Mush 6.5.6 *does* remove the ^A separators when piping, *except* when you
specify a pipe as one of the addresses when sending mail (e.g. if $record
is a pipe).  But that is an incredibly trivial thing to fix.  (Why don't
poeple *tell* me about these problems?!?!)

[ This is NOT an Official Patch.  There will be no further Official
Patches until the SunView port is out of beta-test. ]

In file mail.c, add one line after line 1416:

  1408      /* First, put the message separator in... */
  1409      for (size = 1; size < next_file; size++)
  1410  #ifndef MSG_SEPARATOR
  1411          {
  1412              time_t t;
  1413              (void) time(&t);
  1414              fprintf(files[size], "From %s %s", login, ctime(&t));
  1415          }
  1416  #else /* MSG_SEPARATOR */
+               if (names[size])
  1417  #ifdef MMDF
  1418          fputs(MSG_SEPARATOR, files[size]);
  1419  #else /* MMDF */
  1420          fprintf(files[size], "%s\n", MSG_SEPARATOR);
  1421  #endif /* MMDF */
  1422  #endif /* MSG_SEPARATOR */

and also move lines 1487-1489 to after 1490:

  1486      for (size = 1; size < next_file; size++) {
- 1487  #ifdef END_MSG_SEP
- 1488          fputs(END_MSG_SEP, files[size]);
- 1489  #endif /* END_MSG_SEP */
  1490          if (names[size]) {
+       #ifdef END_MSG_SEP
+                   fputs(END_MSG_SEP, files[size]);
+       #endif /* END_MSG_SEP */
  1491  #ifndef END_MSG_SEP
  1492              fputc('\n', files[size]);
  1493  #endif /* !END_MSG_SEP */
  1494              close_lock(names[size], files[size]);
  1495              xfree(names[size]);
  1496          } else
  1497              pclose(files[size]);
  1498      }

Line 1491 and the one above it could of course be condensed into #else.
-- 
Bart Schaefer           "And if you believe that, you'll believe anything."
                                                            -- DangerMouse
CSNET / Internet                schaefer@cse.ogc.edu
UUCP                            ...{sequent,tektronix,verdix}!ogccse!schaefer

schaefer@ogccse.ogc.edu (Barton E. Schaefer) (07/06/89)

I can't post to alt.religion.computers, so if anybody really wants that
group included, you'll have to add it to any followups.

In article <113637@sun.Eng.Sun.COM> island!argv@sun.com (Dan Heller) writes:
} I don't want to start a religious war on this (and if anyone cares to
} continue this conversation, start a new message; don't reply to this one
} and discuss this issue).  But the simple fact that ^A is not a printable
} character means that mail MTAs are going to have a major problem in
} mailing that message to someone.  You can't mail MMDF folders without a
} probable headache.

In article <WISNER.89Jul4170048@anableps.berkeley.edu> wisner@mica.Berkeley.EDU (Bill Wisner) writes:
} You don't mail \001s, you mail messages. The sequence \001\001\001\001 is
} only used in saved mail files, not when sending messages.

You've never wanted to mail an entire folder from one place to another?
Why shouldn't you be able to "mua somewhere < folder"?  (I use "mua" as
the generic MUA.)  It seems silly to have to uuencode (or the equivalent)
what should primarily be a text file.

Of course, using MH, you can't do that anyway because a folder is not a
file, so who cares. :-)

} \001\001\001\001 is unambiguous. It is a string that is not likely to
} appear in a mail message. On the other hand, I see "^From " in messages
} quite frequently. Well, actually, it's "^>From " that I see frequently.
} 
} UNIX's stoopid mail format insures that no line which starts with the word
} From will survive intact. This mangling of mail is completely unwarranted.
} This sorry excuse for a mailbox format should have been retired years ago.

Mush handles lines beginning with "From " correctly.  It also concedes
that there are a lot of broken mailers that don't.  Heck, with a little
work, the phrase "Bill Wisner Slept Here" could be used as the separator.
:-) :-)  But of course that's at least as silly as "From " followed by a
bunch of address and date information.

As has been pointed out before, the question really is whther one cares
about compatibility.  NOT *convertability* -- that's easy -- but the
ability to switch back and forth between different MUAs without doing
any intermediate work.
-- 
Bart Schaefer           "And if you believe that, you'll believe anything."
                                                            -- DangerMouse
CSNET / Internet                schaefer@cse.ogc.edu
UUCP                            ...{sequent,tektronix,verdix}!ogccse!schaefer

tale@pawl.rpi.edu (David C Lawrence) (07/06/89)

In <Jul.5.02.12.10.1989.3266@NET.BIO.NET> lear@NET.BIO.NET (Eliot Lear) writes:
lear> Actually, strictly speaking, USENET software can handle a number of
lear> control characters, like ESCAPE.  Don't you ever see those messages
lear> with vt100 underline codes?

Strictly speaking, most USENET software doesn't handle \027, at least
not around here.  I had put in a trojan horse to close people's
windows in Suntools when they read my message ... the mail ones
worked, but news wouldn't.  The ESC was being stripped.

Most of the underlining you see being done in news articles is
accomplished because \008 does get passed undisturbed.  To underline
something, you can follow each character to be underlined with a \008_
and it will come out however your environment is configured to handle
this special case.  (In GNUS, I don't get anything but a bunch of
seeming gibberish because the underlined region is all expanded.)

Dave
--
 (setq mail '("tale@pawl.rpi.edu" "tale@itsgw.rpi.edu" "tale@rpitsmts.bitnet"))
        "Drinking coffee for instant relaxation?  That's like drinking
               alcohol for instant motor skills."  -- Marc Price

wisner@mica.Berkeley.EDU (Bill Wisner) (07/06/89)

schaefer@ogccse.ogc.edu (Barton E. Schaefer) writes:

>You've never wanted to mail an entire folder from one place to another?

No. I've been known to FTP them or UUCP them. But never would I trust a
plain mail folder -- of any format -- to the mail system.

Bill Wisner		wisner@mica.berkeley.edu	     ucbvax!mica!wisner
I'm not the NRA either.

karl@giza.cis.ohio-state.edu (Karl Kleinpaste) (07/06/89)

tale@pawl.rpi.edu writes:
   ...\008...

"\008"..."\008"?!?!?

Check your radix, yes?

:-),
--Karl

argv%eureka@Sun.COM (Dan Heller) (07/06/89)

In article wisner@mica.Berkeley.EDU (Bill Wisner) writes:
>  On the other hand, I see "^From " in messages
> quite frequently. Well, actually, it's "^>From " that I see frequently.

> UNIX's stoopid mail format insures that no line which starts with the word
> From will survive intact. This mangling of mail is completely unwarranted.
> This sorry excuse for a mailbox format should have been retired years ago.

Several problems with your observations I think.

1) Any line that starts with "From " will turn into a ">From" line as soon
   as a sendmail system finds it.  You're blaming an MUA for a fault of the
   MTA.  This has nothing to do with Mail, Mush or MH.

2) "From " is not the message separator -- the message separator is:
    "From <address> <date(1)>\n"

Mush, unlike mail, will let From_ lines go and not assume a new message
unless the From_ line meets the above criteria.  While it's true that you
won't see 4 ^A's in a row, chances are also likely that you won't see the
above format -unless- you are mailing a folder.  In both cases, (^A's and
From_), you must prefix the string with something ('>' for example) to
delineate it from being another message.

I have just pointed out that logistically, the two formats are functionally
the same.  The advantage that unix-mail format has over MMDF format is the
fact that you can't send ^A's in mail messages.

As a result, you can't mail MMDF folders to another site any more easily
than you can send a unix-format folder.  In both cases, the folder has to
be modified somehow.


dan <island!argv@sun.com>
-----
My postings reflect my opinion only -- I represent no company's opinions.

wisner@mica.Berkeley.EDU (Bill Wisner) (07/07/89)

In article <113918@sun.Eng.Sun.COM> argv%eureka@Sun.COM (Dan Heller) writes:

>>  On the other hand, I see "^From " in messages
>> quite frequently. Well, actually, it's "^>From " that I see frequently.

>> UNIX's stoopid mail format insures that no line which starts with the word
>> From will survive intact. This mangling of mail is completely unwarranted.
>> This sorry excuse for a mailbox format should have been retired years ago.

>Several problems with your observations I think.

No, I don't think so.

>1) Any line that starts with "From " will turn into a ">From" line as soon
>   as a sendmail system finds it.  You're blaming an MUA for a fault of the
>   MTA.  This has nothing to do with Mail, Mush or MH.

I didn't blame the bloody MUA, I blamed the mail format. That was the topic
of my message, got it? Not which interface is best but which format is best.
And UNIX mail format is not best.

>2) "From " is not the message separator -- the message separator is:
>    "From <address> <date(1)>\n"

That is as may be, but none of the delivery agents that use the UNIX mail
format bother to make that little distinction. Lines that begin with "From "
(or even "from ") get munched.

>I have just pointed out that logistically, the two formats are functionally
>the same.  The advantage that unix-mail format has over MMDF format is the
>fact that you can't send ^A's in mail messages.

Baloney. MMDF will never modify a message regardless of what a line begins
with. And you can't mail \001s *period*. Won't fly. Mailers don't do it.
It has nothing to do with the file format you use.

>As a result, you can't mail MMDF folders to another site any more easily
>than you can send a unix-format folder.  In both cases, the folder has to
>be modified somehow.

What does this have to do with anything? Nobody has said anything in this
discussion about mailing folders except for you. Sending a folder via normal
mail paths is insane. You simply can't expect them to arrive intact, although
that might happen if you're lucky.

Bill Wisner		wisner@mica.berkeley.edu	     ucbvax!mica!wisner
I'm not the NRA either.

les@chinet.chi.il.us (Leslie Mikesell) (07/07/89)

In article <113918@sun.Eng.Sun.COM> island!argv@sun.com (Dan Heller) writes:

>Mush, unlike mail, will let From_ lines go and not assume a new message
>unless the From_ line meets the above criteria.  While it's true that you
>won't see 4 ^A's in a row, chances are also likely that you won't see the
>above format -unless- you are mailing a folder.  In both cases, (^A's and
>From_), you must prefix the string with something ('>' for example) to
>delineate it from being another message.

AT&T's PMX mailer products include a new /bin/mail that uses
Content-Type: and Content-Length: headers to avoid the problem.  Anything
within the Content-Length: characters is the body of the message and
is not parsed for addition header information.  There are several
advantages to this scheme.  If Content-Type: is Multipart, then there
are additional Content-Type:, Content-Length: headers inside of the
first Content-Length: characters, allowing multiple attachments to a
single message.  The Content-Type: field may be used to do format
transformations as required by the receiving system or to note that
the attachment is 8-bit-binary which should not be sent unless the
receiving system is known to handle binary files.  The ability to
attach any type of file to a message with encoding done transparently
if necessary is extremely important in office automation environments.
Any chance of seeing this (or Forms mode) in Mush?

Les Mikesell

gregg@cbnewsc.ATT.COM (gregg.g.wonderly) (07/07/89)

From article <8887@chinet.chi.il.us>, by les@chinet.chi.il.us (Leslie Mikesell):
> AT&T's PMX mailer products include a new /bin/mail that uses
> Content-Type: and Content-Length: headers to avoid the problem.

This is just plain gross!  Last I knew, the MTA was the one attaching
this information.  What is the poor MUA (for their /bin/mail, the
asnwer is...  If the text following Content-length bytes is not "From",
seek back to Content-length and do normal "look for the From line"
processing see, great solution :-() going to do with a file that has
been altered by an unknowing editor (like vi or emacs)?  Once again
another format has been chosen that absolutely solves nothing!

-- 
-----
gregg.g.wonderly@att.com   (AT&T bell laboratories)

les@chinet.chi.il.us (Leslie Mikesell) (07/08/89)

In article <1597@cbnewsc.ATT.COM> gregg@cbnewsc.ATT.COM (gregg.g.wonderly) writes:

>> AT&T's PMX mailer products include a new /bin/mail that uses
>> Content-Type: and Content-Length: headers to avoid the problem.

>This is just plain gross!  Last I knew, the MTA was the one attaching
>this information.  What is the poor MUA (for their /bin/mail, the
>asnwer is...  If the text following Content-length bytes is not "From",
>seek back to Content-length and do normal "look for the From line"
>processing see, great solution :-() going to do with a file that has
>been altered by an unknowing editor (like vi or emacs)?  Once again
>another format has been chosen that absolutely solves nothing!

The MTA only attaches it if it isn't already there (i.e. with the
PMX products the MUA does attachments and adds the headers, although
the PC programs are sort-of MTA's also).

Of course you can destroy the structure by using a normal editor on
the mailbox file but why would anyone do that?  You can confuse any
other MUA by adding or deleting some From_ lines.  Why is this any
worse?  Personally, I'd like to see MTAs keep each message in a separate
file which would eliminate the problem of finding the start of the next
message and would allow links to be used for group messages.  Anyway,
the headers *do* solve two problems: (1) how to have multiple attachments
to a mail message and (2) how to identify the content of the message
or attachment in order to resolve possible conversions needed by the
recipient.  For example, text going between unix and dos machines should
have the line endings modified for the local conventions, but binary
attachments should not. 

Les Mikesell

david@ms.uky.edu (David Herron -- One of the vertebrae) (07/17/89)

In article <8887@chinet.chi.il.us> les@chinet.chi.il.us (Leslie Mikesell) writes:
>AT&T's PMX mailer products include a new /bin/mail that uses
>Content-Type: and Content-Length: headers to avoid the problem.  Anything
...
>Any chance of seeing this (or Forms mode) in Mush?

er..  gads!  While that sounds like a possibly nice feature, it's
*EXTREMELY* un-RFC-822-ish.  Mush has to live out here in the world
of RFC-822 where mailing binary files around breaks things all
over the place.
-- 
<- David Herron; an MMDF guy                              <david@ms.uky.edu>
<- ska: David le casse\*'      {rutgers,uunet}!ukma!david, david@UKMA.BITNET
<-
<- WARNING: Hunting season is now open in West Virginia!

les@chinet.chi.il.us (Leslie Mikesell) (07/17/89)

In article <12184@s.ms.uky.edu> david@ms.uky.edu (David Herron -- One of the vertebrae) writes:
>>AT&T's PMX mailer products include a new /bin/mail that uses
>>Content-Type: and Content-Length: headers to avoid the problem.  Anything

>er..  gads!  While that sounds like a possibly nice feature, it's
>*EXTREMELY* un-RFC-822-ish.  Mush has to live out here in the world
>of RFC-822 where mailing binary files around breaks things all
>over the place.

The /bin/mail included with the PMX products uses /usr/lib/binarsys to
look up whether the destination (or next-hop) can handle binary attachments
and will bounce them if not (although it would be easy to encode on an
as-needed basis).  The MUA has a configuration option to encode attachments
using btoa, which is the method I would like to see other MUA's use since
they probably can't provide an alternate /bin/mail. The mail passed to
non-binary hosts should go through normal mailers and is only
un-RFC-822-ish due to the extra headers that do not have the leading X-.
There are several things I don't like about the PMX-mailers but I'll 
refrain from AT&T bashing unless someone asks...

Les Mikesell