[comp.emacs] Babyl Format used in rmail

felix@AI.SRI.COM (Francois Felix INGRAND) (04/12/89)

I am looking for a definition of the Babyl Format used in rmail.

By the way, I am also looking for a definition of REGEXP as used in
GNU Emacs. Is it the same than the grep REGEXP? 

Thanks in advance,
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Francois Felix INGRAND                          SRI International, AIC
felix@AI.SRI.COM                                333, Ravenswood Avenue
felix%AI.SRI.COM@UUNET.UU.NET                   MENLO PARK, CA 94025, USA
"Pourquoi tant de haine..." (Edika)      "Read my Lisp... No new syntax" (nil)

mende@athos.rutgers.edu (Bob Mende Pie) (04/12/89)

I got this off the net a long time ago... gee... I guess it is ok to be a
pack-rat :-)

============================================================
From rlk@think.COM (Robert Krawitz) Mon Nov 30 10:56:46 1987

Let's see if I remember my BNF for babyl files; this corresponds to
version 5:


File := <header>
	<message>*	; Some say there must be at least one message.

Header := Babyl Options:\n
	  <header-option>*
	  |^_

Header-option := <header-token>	; See note [5]
		 : *
		 <value>

header-token := [^\000-\017:\177-\377]*	; Not these characters [tab is OK]
header-value := ditto, if a list, each element separated by a comma and
		a space.

message := \^L\n
	   [01],	; See note [1] below
	   ( <attribute>,)*	; Note space before and comma after token
	   ,
	   ( <label>,)*		; ditto, see note [4] below
	   \n
	   <header>*	; See note [1] and [2] below
	   *** EOOH ***\n
	   <header>*	; See note [2] below
	   \n
	   <body>
	   \^_

attribute := unseen |
	     last |	; Not all programs implement this.  It
			; generally only gets used internally, and
			; isn't written out to a file.
	     >last |	; Babyl uses this for a deleted message at the
			; end.  It shouldn't be written out to a file.
	     deleted |
	     recent |	; Not all programs implement this.  It refers
			; to a message in the last batch of new mail;
			; thus it probably shouldn't be written out to
			; a file during a normal save although it
			; makes sense to write it out in an emergency save.
	     filed |
	     answered |
	     forwarded |
	     redistributed |
	     badheader |	; Not all programs implement this
	     filed		; Not all programs implement this

label := [^\000-\020,\177-\377]*	; No control chars,
			; whitespace, commas, rubout, or high bit set

header := [^\000-\020:\177-\377]*:
	  <header-line>
	  <header-line>*

header-line := [ \t][^\n]*\n	; Continuation lines must be indented

body := (.*\n)*		; See note [3] below


[1] A zero means that the headers have not been cleaned up,
reprocessed, toggled, or whatever.  In this case there should be no
headers before the EOOH line.  A one means that the headers have been
reprocessed.  In this case, the original headers will typically be
before the EOOH line and the reformatted or whatever subset of headers
that the user should see will be after it.  Note that in this case
it's permissible to garbage collect all headers before the EOOH line.
No one's defined what it means to garbage collect SOME of the headers
before this line, or what that means.

[2] It's apparently permissible to add headers of the program's own
choosing before the EOOH line.  Or at least, Rmail does so (it caches
a summary line) and nothing seems to object.  There's no particular
guarantee that something else won't step all over it, though.  Headers
after the EOOH line can be reformatted as the program wishes (e. g.
indent the header lines to the same distance, canonicalize machine
names) for display to the user.  It's generally best for programs that
read a babyl file to look at the headers before the EOOH line if they
exist, since these should be untouched by the user.  Remember, the
user can edit anything after the EOOH line.

[3] A \^_ at the beginning of a line should be quoted somehow.  The
normal way seems to be to decompose it into 2 characters: a ^ and a _.
Strictly speaking, it doesn't always have to be, since the following
text would have to be parsable as a message, but some programs don't
try to use that much intelligence.  Oh well.

[4] Labels, or keywords as they are often called, are generally
defined by the user, although it's not entirely impermissible for a
program to use these for its own purpose (e. g. a keyword named
RemindMe might be used to automatically find important messages).
Some people also want these used to cache other state implemented by
certain programs; this use is undefined.  Note that all keywords used
should be inserted in a header-option named Keywords:.  Can a keyword
have the same name as an attribute?  Who knows?  It's probably not a
good idea, since some programs use the concept of <labels> =
<keywords> + <attributes>.  Sigh.

[5] Some tokens are standardized in meaning.  Common tokens are Mail
inboxes, babyl file version number, which is currently 5, labels used
in messages, window format for Zmail, anything else you want to be
associated with a file.  Be warned that labels should be a complete
list of all user-defined keywords used in the file, so if you add a
new label to a message, you should add it to this list.  You should
also have a Babyl version: 5 file attribute (look in a babyl file for
details).

Anyone know if there actually is a "formal" standard?  This was done
quickly from memory and a Zmail manual, but there are at least three
programs around that use Babyl files (zmail, babyl, and emacs/rmail)
and someone at SIPB was going to write a command-based mail reader
similar to Unix Mail but operating on babyl files, and someone (of
course not me :-)) should probably write xbabyl :-)

References:

ITS/Tops-20 INFO file on babyl (who wrote it?  ECC?  GZ?)

Zmail manual (the MIT version was written by RMS; ECC wrote the
section on Babyl file format)
-- 

fuat@cunixc.cc.columbia.edu (Fuat C. Baran) (04/13/89)

In article <Apr.12.09.58.15.1989.2063@athos.rutgers.edu> mende@athos.rutgers.edu (Bob Mende Pie) writes:
>Anyone know if there actually is a "formal" standard?  This was done
>quickly from memory and a Zmail manual, but there are at least three
>programs around that use Babyl files (zmail, babyl, and emacs/rmail)
>and someone at SIPB was going to write a command-based mail reader
>similar to Unix Mail but operating on babyl files, and someone (of
>course not me :-)) should probably write xbabyl :-)

Columbia-MM is a another program that operates on babyl format mail
files (and also a few others).  Unfortunately, we didn't have a
document describing the format to work from, so it was based on output
from gnuemacs rmail.  In fact, it was based on a "buggy" version of
rmail, and recently we fixed it (there was a problem with having or
not having spaces after the comma separating keywords/labels).

I would be very interested in seeing a document that describes what
the official babyl format is.  If you have such a document could you
mail me a copy, (I don't always get to read news before it
expires...).  Thanks a lot.


						--Fuat
-- 
INTERNET: fuat@columbia.edu          U.S. MAIL: Columbia University
BITNET:   fuat@cunixc.cc.columbia.edu           Center for Computing Activities
USENET:   ...!rutgers!columbia!cunixc!fuat      712 Watson Labs, 612 W115th St.
PHONE:    (212) 854-5128                        New York, NY 10025

rlk@think.com (Robert Krawitz) (04/13/89)

I tried to find an "official" document describing babyl format when I
was working on rmail (I didn't write it initially, but I extensively
modified it over the summer of 1985), but I never had much success.
The only two documents I could find were the two that I referenced,
the Zmail manual written by RMS, and the (unpublished) info file on
Babyl that I grabbed from oz when that machine was still around.

The message of mine that Bob Mende reposted to this newsgroup is the
standard that I used as a reference when working on rmail.  If Fuat
used the rmail distributed with Version 18 of emacs, he was correct
that it was incorrect -- babyl and zmail put spaces after the commas
separating keywords and attributes (labels).  Various people have
corrected this error.  My preferred fix to rmail accepts either
syntax, while using a variable to control which compatibility option
is used.

If there is an official document describing babyl format, it would be
useful.  I suspect that the chances of finding one are even less than
they were in the fall of 1987 (or the summer of 1985), but it's always
possible that someone knows something and doesn't know that people are
interested.
-- 
ames >>>>>>>>>  |	Robert Krawitz <rlk@think.com>	245 First St.
bloom-beacon >  |think!rlk	(postmaster)		Cambridge, MA  02142
harvard >>>>>>  .	Thinking Machines Corp.		(617)876-1111

fuat@cunixc.cc.columbia.edu (Fuat C. Baran) (04/15/89)

In article <Apr.12.09.58.15.1989.2063@athos.rutgers.edu> mende@athos.rutgers.edu (Bob Mende Pie) writes:
>References:
>
>ITS/Tops-20 INFO file on babyl (who wrote it?  ECC?  GZ?)
>
>Zmail manual (the MIT version was written by RMS; ECC wrote the
>section on Babyl file format)

Well, last night I got my hands on the TOPS-20 INFO file on babyl
(INFO:BABYL.INFO) thanks to a friend with access to a -20.  It is a
70K+ document describing how to use babyl mode, and has a reference
section for implementors at the end, which describes the format.  If
anyone ones a copy, let me know.  I don't think there are any
copyright restrictions.


						--Fuat

-- 
INTERNET: fuat@columbia.edu          U.S. MAIL: Columbia University
BITNET:   fuat@cunixc.cc.columbia.edu           Center for Computing Activities
USENET:   ...!rutgers!columbia!cunixc!fuat      712 Watson Labs, 612 W115th St.
PHONE:    (212) 854-5128                        New York, NY 10025