[comp.mail.headers] "data base" mail system idea

wales@CS.UCLA.EDU (02/06/88)

I recently had an idea for a new kind of User Agent mail program.  I'd
like to see if I can develop it somehow into a research project that
could form the basis for my Ph.D. dissertation.  (The bare idea, all by
itself, doesn't appear to be substantive enough.)

Although this topic doesn't specifically concern mail headers or trans-
port protocols, I am including the "comp.mail.headers" newsgroup because
I want to be particularly sure of reaching people on the Internet (via
the HEADER-PEOPLE mailing list) who are familiar with non-UNIX computing
environments.

First, some background.

Many existing systems (such as Berkeley Mail and MH) work along a "fil-
ing cabinet" model.  That is, the user selects a "folder" of messages
and can then examine the contents of that folder.  (Whether the "fol-
ders" are implemented as single files, as in Berkeley Mail, or as direc-
tories, as in MH, is not really crucial to the concept.)  Incoming mail
goes into a special "in-box" folder, which is generally the folder the
user selects by default.  Mail can be kept in the folder, moved to
another folder, or deleted entirely.

The "filing cabinet" model, I feel, starts to break down when one deals
with large amounts of mail.  The main problem is that there is usually
no good way to locate a given piece of mail, unless the user can remem-
ber which folder he filed it in.

The idea I had was to take all of one's incoming mail and put it into a
information retrieval system.  Messages could then be searched for by
any of a number of criteria (dates, addresses, subject, user-assigned
keywords, etc.).  The kinds of searches available would be limited only
by the resources available to do the indexing and the searching.

This concept is similar in some respects to the "keyword-based news"
system proposed by Brad Templeton some years ago (though I'm not saying
that it is the same as Brad's idea).

Especially if one considers recent developments in personal filing sys-
tems (e.g., Hypercard), it seems like an "information-retrieval-model"
mail management system should be feasible.  When I proposed the idea to
a graduate seminar here at UCLA a couple of weeks ago, one participant
commented that my idea could probably be implemented in a couple of days
using Hypercard or other similar tools (and would, as a result, not be
very interesting as original research).  Yet, as far as I've been able
to discover so far in the course of my reading, no one has done this.

Is anyone out there on the net aware of a mail system that does the
kinds of things I am suggesting here?  If in fact it hasn't been done,
is there some major "show-stopper" problem that has kept it from being
done?  Although I wish the answer were simply that no one had ever
thought of doing this kind of thing before me, I doubt this is the case.

I freely confess to a certain amount of "UNIX myopia", and would partic-
ularly like to hear about e-mail management tools that are radically
different from those customarily used on most UNIX systems.

-- Rich Wales // UCLA Computer Science Department // +1 (213) 825-5683
	3531 Boelter Hall // Los Angeles, California 90024-1596 // USA
	wales@CS.UCLA.EDU           ...!(ucbvax,rutgers)!ucla-cs!wales
"Sir, there is a multilegged creature crawling on your shoulder."

blarson@skat.usc.edu (Bob Larson) (02/06/88)

In article <11120@shemp.UCLA.EDU> wales@CS.UCLA.EDU (Rich Wales) writes:
>The idea I had was to take all of one's incoming mail and put it into a
>information retrieval system.  Messages could then be searched for by
>any of a number of criteria (dates, addresses, subject, user-assigned
>keywords, etc.).  The kinds of searches available would be limited only
>by the resources available to do the indexing and the searching.

Tops-20 Mm allows message selection by all of the above mentioned
things plus some others.  (message body contents, unseen, flagged,
etc.) It also supports the multiple folder model.  It does not use a
general purpouse database.

There are at least two copies of Mm for unix, the one I have used has
major problems, I consider it "almost usable".  I also have a Mm
subset for primos.

>Is anyone out there on the net aware of a mail system that does the
>kinds of things I am suggesting here?  If in fact it hasn't been done,
>is there some major "show-stopper" problem that has kept it from being
>done?

No major show stopper that I know of.  Implamentation problems such as
limited mailbox size do exist.  (Startup is also slower on large
mailboxes.)  Most of the problems with keywords have been rediscovered
by mm users, although since the same person assigns and uses the
keywords they are less than a general keyword system.
--
Bob Larson	Arpa: Blarson@Ecla.Usc.Edu	blarson@skat.usc.edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson
Prime mailing list:	info-prime-request%fns1@ecla.usc.edu
			oberon!fns1!info-prime-request

sa@ttidca.TTI.COM (Steve Alter) (02/12/88)

In article <11120@shemp.UCLA.EDU> wales@CS.UCLA.EDU (Rich Wales) writes:
} ...
} Many existing systems (such as Berkeley Mail and MH) work along a "fil-
} ing cabinet" model.  That is, the user selects a "folder" of messages
} and can then examine the contents of that folder.
} ...
} 
} The "filing cabinet" model, I feel, starts to break down when one deals
} with large amounts of mail.  The main problem is that there is usually
} no good way to locate a given piece of mail, unless the user can remem-
} ber which folder he filed it in.

Sorry to rain on your parade, Rich, but there is a widely available
user-agent mailer that provides at least a portion of the facilities
you describe.  Although this system does not automatically handle
addresses and dates (as far as I know) in its "message-perusal"
functions, it does handle user-defined keywords and each message can
be assigned more than one of them.  The user can then peruse all
messages that contain a selected keyword.  This capability replaces
and augments the concept of folders.  Furthermore, if you forget the
exact form/spelling of a keyword that you chose months ago, this
system provides auto-completion and choice-listings just like the
Tenex c-shell (tcsh).

The disadvantage of this specific mailer is that it is quite large and
won't fit on small machines such as PDP-11s.

I am refering to GNU Emacs and its "rmail" package.

In retrospect, it shouldn't be that difficult to augment the MH system
to support "cross-filing" between folders, and use hard-links so that
different message-numbers in different folders point to the same
message.  All it would really need is a clean specification for the
command-line syntax.  Hey RAND, are you listening?

-- Steve Alter
...!{csun,rdlvax,trwrb,psivax}!ttidca!alter  or  alter@tti.com
Citicorp/TTI, Santa Monica CA  (213) 452-9191 x2541

scott@tekcrl.TEK.COM (Scott Huddleston) (02/22/88)

>In retrospect, it shouldn't be that difficult to augment the MH system
>to support "cross-filing" between folders, and use hard-links ...

"refile -link ..." already does this.

The Unix file system is hardly a candidate for serious database 
capabilities, however.  It's limitations include:
a). keywords are limited to Unix filenames.
b). MH folders give records (mail msgs) per keyword, but not keywords
    per record.  (i.e., associations are only one-way).
c). hard-links can't be made to other file-system partitions (ruling out
    netnews from this "database" mechanism)
d). the space and performance costs of using one inode per relation are
    substantial.