[comp.mail.misc] "data base" mail system idea

wales@CS.UCLA.EDU (02/06/88)

I recently had an idea for a new kind of User Agent mail program.  I'd
like to see if I can develop it somehow into a research project that
could form the basis for my Ph.D. dissertation.  (The bare idea, all by
itself, doesn't appear to be substantive enough.)

Although this topic doesn't specifically concern mail headers or trans-
port protocols, I am including the "comp.mail.headers" newsgroup because
I want to be particularly sure of reaching people on the Internet (via
the HEADER-PEOPLE mailing list) who are familiar with non-UNIX computing
environments.

First, some background.

Many existing systems (such as Berkeley Mail and MH) work along a "fil-
ing cabinet" model.  That is, the user selects a "folder" of messages
and can then examine the contents of that folder.  (Whether the "fol-
ders" are implemented as single files, as in Berkeley Mail, or as direc-
tories, as in MH, is not really crucial to the concept.)  Incoming mail
goes into a special "in-box" folder, which is generally the folder the
user selects by default.  Mail can be kept in the folder, moved to
another folder, or deleted entirely.

The "filing cabinet" model, I feel, starts to break down when one deals
with large amounts of mail.  The main problem is that there is usually
no good way to locate a given piece of mail, unless the user can remem-
ber which folder he filed it in.

The idea I had was to take all of one's incoming mail and put it into a
information retrieval system.  Messages could then be searched for by
any of a number of criteria (dates, addresses, subject, user-assigned
keywords, etc.).  The kinds of searches available would be limited only
by the resources available to do the indexing and the searching.

This concept is similar in some respects to the "keyword-based news"
system proposed by Brad Templeton some years ago (though I'm not saying
that it is the same as Brad's idea).

Especially if one considers recent developments in personal filing sys-
tems (e.g., Hypercard), it seems like an "information-retrieval-model"
mail management system should be feasible.  When I proposed the idea to
a graduate seminar here at UCLA a couple of weeks ago, one participant
commented that my idea could probably be implemented in a couple of days
using Hypercard or other similar tools (and would, as a result, not be
very interesting as original research).  Yet, as far as I've been able
to discover so far in the course of my reading, no one has done this.

Is anyone out there on the net aware of a mail system that does the
kinds of things I am suggesting here?  If in fact it hasn't been done,
is there some major "show-stopper" problem that has kept it from being
done?  Although I wish the answer were simply that no one had ever
thought of doing this kind of thing before me, I doubt this is the case.

I freely confess to a certain amount of "UNIX myopia", and would partic-
ularly like to hear about e-mail management tools that are radically
different from those customarily used on most UNIX systems.

-- Rich Wales // UCLA Computer Science Department // +1 (213) 825-5683
	3531 Boelter Hall // Los Angeles, California 90024-1596 // USA
	wales@CS.UCLA.EDU           ...!(ucbvax,rutgers)!ucla-cs!wales
"Sir, there is a multilegged creature crawling on your shoulder."

blarson@skat.usc.edu (Bob Larson) (02/06/88)

In article <11120@shemp.UCLA.EDU> wales@CS.UCLA.EDU (Rich Wales) writes:
>The idea I had was to take all of one's incoming mail and put it into a
>information retrieval system.  Messages could then be searched for by
>any of a number of criteria (dates, addresses, subject, user-assigned
>keywords, etc.).  The kinds of searches available would be limited only
>by the resources available to do the indexing and the searching.

Tops-20 Mm allows message selection by all of the above mentioned
things plus some others.  (message body contents, unseen, flagged,
etc.) It also supports the multiple folder model.  It does not use a
general purpouse database.

There are at least two copies of Mm for unix, the one I have used has
major problems, I consider it "almost usable".  I also have a Mm
subset for primos.

>Is anyone out there on the net aware of a mail system that does the
>kinds of things I am suggesting here?  If in fact it hasn't been done,
>is there some major "show-stopper" problem that has kept it from being
>done?

No major show stopper that I know of.  Implamentation problems such as
limited mailbox size do exist.  (Startup is also slower on large
mailboxes.)  Most of the problems with keywords have been rediscovered
by mm users, although since the same person assigns and uses the
keywords they are less than a general keyword system.
--
Bob Larson	Arpa: Blarson@Ecla.Usc.Edu	blarson@skat.usc.edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson
Prime mailing list:	info-prime-request%fns1@ecla.usc.edu
			oberon!fns1!info-prime-request

sa@ttidca.TTI.COM (Steve Alter) (02/12/88)

In article <11120@shemp.UCLA.EDU> wales@CS.UCLA.EDU (Rich Wales) writes:
} ...
} Many existing systems (such as Berkeley Mail and MH) work along a "fil-
} ing cabinet" model.  That is, the user selects a "folder" of messages
} and can then examine the contents of that folder.
} ...
} 
} The "filing cabinet" model, I feel, starts to break down when one deals
} with large amounts of mail.  The main problem is that there is usually
} no good way to locate a given piece of mail, unless the user can remem-
} ber which folder he filed it in.

Sorry to rain on your parade, Rich, but there is a widely available
user-agent mailer that provides at least a portion of the facilities
you describe.  Although this system does not automatically handle
addresses and dates (as far as I know) in its "message-perusal"
functions, it does handle user-defined keywords and each message can
be assigned more than one of them.  The user can then peruse all
messages that contain a selected keyword.  This capability replaces
and augments the concept of folders.  Furthermore, if you forget the
exact form/spelling of a keyword that you chose months ago, this
system provides auto-completion and choice-listings just like the
Tenex c-shell (tcsh).

The disadvantage of this specific mailer is that it is quite large and
won't fit on small machines such as PDP-11s.

I am refering to GNU Emacs and its "rmail" package.

In retrospect, it shouldn't be that difficult to augment the MH system
to support "cross-filing" between folders, and use hard-links so that
different message-numbers in different folders point to the same
message.  All it would really need is a clean specification for the
command-line syntax.  Hey RAND, are you listening?

-- Steve Alter
...!{csun,rdlvax,trwrb,psivax}!ttidca!alter  or  alter@tti.com
Citicorp/TTI, Santa Monica CA  (213) 452-9191 x2541

gillies@uiucdcsp.cs.uiuc.edu (02/15/88)

Re: The problem of losing mail somewhere within a myriad of folders

I know of at least one mail system that solves this problem.  It's
called Babar, and runs at Xerox on smalltalk machines.  It has some
pretty nice database functions, but mainly it provides heirarchical
mail folders, and the ability to enter a message under multiple
categories.

Many smalltalk users manage 10 megabytes of mail (2000+ messages) with
no problem.  They are running on Doradoes, 68020-class workstations.

Every message goes into one huge file, and folders are implemented as
sets of pointers into this file.  Therefore, the text of a message is
only stored once, even it appears in 25 categories.  There are some
standard categories, like "deleteable", "sent-by-me", "everything",
etc. that the system maintains.

When you delete a message it goes into the deleteable category.  When
you expunge, everything in this category is zapped.  Whenever you send
a message, a copy is saved in the sent-by-me category.  The everything
category references all the messages.  In particular, you can do a
text search through everything and the results are stored in a new
category.  This makes it very easy to relocate a containing a unique
keyword.

The system has a built-in mail file scavenger.  It also modifies the
mail database using atomic actions.  The atomic actions are
specialized, so you can even mount a remote mail database on an IFS
file server, and access it transparently.

Don Gillies {ihnp4!uiucdcs!gillies} U of Illinois
            {gillies@p.cs.uiuc.edu}

marvit@hpcea.CE.HP.COM (Peter Marvit) (02/18/88)

> In retrospect, it shouldn't be that difficult to augment the MH system
> to support "cross-filing" between folders, and use hard-links so that
> different message-numbers in different folders point to the same
> message.  All it would really need is a clean specification for the
> command-line syntax.  Hey RAND, are you listening?

Well, I'm not RAND, but MH (6.5) already has this feature.  In fact, the
original poster may wish to consider using MH as a "back-end" to his system
-- at least for the prototype stage.  In production, he may wish to rewrite
some parts of the system for efficiency sake since MH can be slow for some
functions.

Back to the quoted posting above,

	refile <msg> +<folder1> +<folder2> +<folder3> -link

will hard link the <msg> to a number of folders simultaneously.  The main
problem is going the other way -- deciphering which folders a particular
message belongs to.

-Peter Marvit
 HP Labs
 <marvit@hplabs.hp.com>    or    <{any biggie}!hplabs!marvit>

scott@tekcrl.TEK.COM (Scott Huddleston) (02/22/88)

>In retrospect, it shouldn't be that difficult to augment the MH system
>to support "cross-filing" between folders, and use hard-links ...

"refile -link ..." already does this.

The Unix file system is hardly a candidate for serious database 
capabilities, however.  It's limitations include:
a). keywords are limited to Unix filenames.
b). MH folders give records (mail msgs) per keyword, but not keywords
    per record.  (i.e., associations are only one-way).
c). hard-links can't be made to other file-system partitions (ruling out
    netnews from this "database" mechanism)
d). the space and performance costs of using one inode per relation are
    substantial.