[comp.mail.misc] Centralized mail systems summary

pritch@cheops.cis.ohio-state.edu (Norm Pritchett) (05/23/89)
A few weeks ago I posted a query to find out what people are doing
with centralized mail systems.  I promised to followup with a summary
of the responses and to let you know what we were going to do at Ohio
State.  The latter will be in my next posting.  Below are a summary of
responses.  I will mention a contact name for each of the mail systems
listed below but it might not be a person involved with that project -
merely a user who described the mail system to me.

1) Sun Microsystems.  Of the ones described to me this appears to be
the most well known -- I received messages from 5 people plus Sun
about it.  Contact: Bill Melohn <melohn@eng.sun.com>.

	At Sun we have managed to do this for our some 10,000
	employees. We currently use a two-tier method of resolving
	names from a unix user name (like "melohn") to <first initial
	of first name><lastname> (ie "bmelohn") syntax. This alias in
	turns points to the username@mailboxhost for the user. When
	conflicts occur in the scheme, we make the alias a pipe to a
	shell script called ambigmail, which sends mail back to the
	sender with the various GCOS field entries that match the
	ambig alias.

	We are in the process of enhancing this scheme by making the
	second alias from above reference a username@Area, which would
	allow us to distribute the alias expansion to a series of
	mailhosts for each area.

	For mail destined for the Internet, we rewrite outgoing mail
	headers to be user@Sun.COM; eventually this function will be
	done on an area basis, with outgoing mail messages that look
	like user@Area.Sun.COM (as mine does today).

2) UC Davis.  Contact: jcgargano@ucdavis.edu (Joan Gargano).

	I am in charge of our mailname system at U.C. Davis.  We have
	about 20,000 faculty and staff, and 20,000 students.  I
	maintain a database of over 20,000 mailnames, first initial,
	middle initial, last name, for faculty and staff which I
	constructed from the payroll files.  We have had a number of
	collisions which I have resolved by altering the middle
	initial of one of the names.  Stanford uses a similar system.

	You may query our database by using whois:

	There is a directory that is accessible via the whois program.
	We have added to the whois program to search our local
	database through the program or through electronic mail.

3) Purdue University.  Contact Dave Stevens (dls@alecto.cc.purdue.edu) .

	We've just started work on what we call campus-wide electronic
	mail.  We intend to use name server MR records and Profile for
	the white pages service.  We're thinking about using NeXT work
	stations in a distributed model.

4) Proteon Inc.  Contact Alan Marshall <acm@proteon.com>.

	I can recommend CCMail as the product to work with pc
	networks.  This is connectable to SMTP mail via a public
	domain translator that I have written.  It is available from
	monk.protoen.com as arpagw.arc in the /ftp/pub directory.
	There is another implementation done from my implementation by
	Mike Morse (mmorse@nsf.com) that uses about 10k users in the
	system.  He would be a good one to talk with about your needs.
	I have some code from him too and he has said to make it
	available.  Perhaps there is a more current version that would
	help.

5) MIT.  Contact Mark Rosentein <mar@athena.mit.edu>.

	We've been thinking of tackling this problem here at MIT.  Our
	initial planning is as follows:

	*  The full name of every member of the MIT community will be known to
	   the mail hub.  Mail sent to someone's full name will result in:

	1) The mail is delivered if the name is unique and the person
	   has a mailbox 
	2) An error response is generated saying "[full name] does not
	   have an electronic mail address, please send mail to MIT
	   Room ..., Cambridge MA 02139"
	3) An error response is generated saying "[full name] is
	   ambiguous, please choose one:" followed by a list of people
	   giving the name, title, address, and a unique email
	   identifier.
	4) An error response saying "addressee unknown".

	*  Every member of the MIT community will be given a unique
	   identifier for email purposes.  For most active email users, this
	   will be their login name.  For other people and those with name
	   conflicts, it will be their initials and a number, similar to the
	   NIC's whois database.

	This information will be kept up-to-date by Moira, the Athena
	Service Management System, and regularly updated on the mailhub.
	Users will be allowed to update some of their own information,
	and to become unlisted if they want to. 

	Moira currently contains all of the necessary information for the
	students here at MIT, only the staff and remaining faculty must
	be added.  The primary development effort will be modifications
	to the mail hub. 

6) NCR. Contact Matt Costello <Matt.Costello@sandiego.nr.com>

	Well, I can help out some on this.  I designed the system in use
	throughout NCR and it conforms to your requirements.  There are
	~1300 people with email addresses in San Diego.  I believe Dayton
	(Galactic Headquarters) has around 6000 email addresses in it. 

	What I did was to create a database external to the mail system
	and then have the mail router look up certain addresses in this
	external database.  Any mail addressed to the domain name, or
	having a period in the username will be looked up in this
	database.  The database format is a rolodex(tm) format.  My entry
	is simply

	name	Matthew Costello
	phone	2926
	dept	4796
	email	mattc@ncr-sd

	This database format is simple to manipulate and edit using the
	standard unix tools, but there are also 7 programs (rolo,
	roloeach, roloedit, roloenter, rolorev, rolorpt & rolosort) that
	handle it more efficiently.  The command rolo(1) is used to look
	up the local portion of the name using a fuzzy matching
	technique, so the following will all find my entry and get my
	mail to me:
		matthew.costello
		matt.costello
		costello		I'm the only Costello in San Diego
		pat.costello
		m.castelli
		
	[ Accepting the last two was a mistake.  It would be better to
	fail and then [ return the close matches. 

	Because of the fuzzy matching the search must be linear through
	the whole file.  To compensate our mail router is able to cache
	found addresses in a separate file so they only get looked up
	once.  I would recommend using an "initial substring match" which
	is amenable to indexing. 

	-- 
	Matt Costello       <matt.costello@SanDiego.NCR.COM>      (CSNET)
	+1 619 485 2926     uunet!ncrlnk!ncr-sd!mattc
	---
	Matt Costello       <matt.costello@SanDiego.NCR.COM>      (CSNET)
	+1 619 485 2926     uunet!ncrlnk!ncr-sd!mattc

7) Universite Catholique de Louvain.  Contact Alain FONTAINE
  <FNTA80@BUCLLN11.BITNET>. 

	We have established an unified address scheme here. But we did
	not find any way to allow external correspondants to send mail to
	an individual when only knowing his name, *and* avoid clashes...
	This seems theoretically impossible. The sender must *know* and
	*specify* some more information to garantee uniqueness.  So the
	addresses used are of the form :
	personal-identifier@unit.ucl.ac.be, where 'unit' is the
	standardized three or four letter sigle of the laboratory or
	service in which the person can be found.  Of course, it is
	difficult for an external correspondant trying to contact
	somebody for the first time to guess the 'unit' to be used. On
	the other hand, clashes are a very low probability event, since
	units never count more than 50 persons. 

	Implementation : the DNS would be a marvelous tool for this,
	since each unit could have and manage its own name server. Halas,
	(one of my favorite gripes), the arbitrary division of mail
	addresses into a local and a domain part makes it impossible to
	use the DNS down to the individual level. So the current
	situation is that one centralized machine contains a centralized
	database of mail routing information, and nearly all
	domain-addressed mail goes physically (uh, should we say that
	about zeroes-and-ones on wires and disks and ...) through that
	machine. 

8) Carnegie-Mellon University.  Contact Craig Everhart
  <cfe+@andrew.cmu.edu>. 

	Andrew supports 8500 user names reasonably gracefully, though
	we've given up on making login-names guessable; too many
	collisions.  Instead, we use a White Pages service to map name
	probes to mailboxes, letting it handle any collisions. 

	  My free advice to you would be to forget making names unique;
	they never will be.  Make login-names unique and provide simple
	ways to map from person-names to login-names (and use them for
	delivering incoming mail).  MCI Mail, with a cast of many hundred
	thousand, did the same thing; everybody's mailbox is a number. 

9) University of Illinois at Urbana.  Contact Paul Pomes
  <Paul@uxc.cso.uiuc.edu>. 

	The Computing Services Office at the University of Illinois at
	Urbana is in the process of creating a university-wide mailing
	system.  The system is comprised of three pieces.  The largest is
	the white-pages system created by Steve Dorner of CSO.  It's
	based on the CSnet central name server (qi - Query Interpreter).
	Each student and staff member is assigned a unique alias.  The
	user is allowed to change the issued alias provided it remains
	unique.  Associated with this alias is the user's preferred email
	address, office address, home address, phone numbers, etc.
	Everything that is in the paper phone book is also in the qi
	database. 

	The user client is a program called ph.  It searches on the
	unique alias and can fuzzy match on names.  Providing ancillary
	information such as department or curriculum narrows the search. 

	The second piece is the 5.61+IDA sendmail release.  The
	ida/cf/Sendmail.mc has been very slightly modified to invoke a
	new mailer, phquery, whenever an address resolves to
	<name>@uiuc.edu.  This is configured with the DOMAINMASTER
	option. 

	Phquery is the third piece.  It examines its arguments and calls
	qi to determine the preferred email address for the supplied
	name.  At this point, name can be only the unique qi alias.  This
	restriction will soon be lifted to allow phquery to resolve full
	names (e.g., paul-pomes@uiuc.edu -> paul@uxc.cso.uiuc.edu), and
	amateur radio callsigns (e.g., ka9wgn@uiuc.edu ->
	phil@vmd.cso.uiuc.edu).  In the case of ambiguous matches,
	phquery will return a list of possibilities that includes
	department and/or curriculum information that should allow the
	sender to make the next attempt successful. 

	Future enhancements include automated printing and campus mailing
	of messages to those users w.o. email addresses. 

	Source for the qi (central server) and ph (user client) can be
	obtained via anon-FTP from uxc.cso.uiuc.edu:/net/{ph,qi}.  The
	phquery code, when ready, will be included in the
	/mail/sendmail/uiuc directory. 

	Sorry, we cannot email this code as it is much too large.
	Chocolate chip cookies with a postpaid tape will work wonders
	though. 

10) University of Virgina.  Contact Tom Sigmon <tms@virgina.edu>


	I am responding to the request in BIG-LAN regarding
	university-wide electronic mail networks.  We here at the
	University of Virginia have created such an environment that
	addresses most of the points that were brought up.  Our
	electronic mail environment currently encompasses over 300
	machines (not PCs, etc.) having many different mailers running on
	many different operating systems.  I'll try to summarize the
	basic points here and if there are follow-up questions, I'd be
	happy to address them. 

	   - we use domain addresses only.  If a user wants to send mail
	to someone on a non-domain network, then they must use an
	appropriate "pseudo-domain" within a domain address.  For
	example, sending mail to someone on Bitnet would require an
	address of the form "user@host.bitnet".  Likewise, sending mail
	to someone on a UUCP host requires an address of the form
	"user@host.uucp" (our mailers figure out the best path to the
	target host). 

	- third-level domains within the "virginia.edu" domain are named after
	departments or other University organizations (usually using
	the standard registrar's designation)

	- departments create whatever fourth-level domains or machine
	names (the usual case) that they desire

	- we here in the Academic Computing Center created and maintain
	a database of every faculty, staff, or student associated with
	the University.  The basic data comes from the registrar's database
	and the payroll database from our administrative computing center.

	- as part of this database, we automatically create unique mail ids
	for every single person associated with the University.  These ids
	are also (conveniently) used as the login id on most machines.  The
	format of this unique id is as follows:  person's initials optionally
	followed by a 2-character suffix whose first character is a digit and
	whose second character is alphabetic.  For instance, my mail/login id
	is simply my initials, "tms".  All other people who have the same
	initials as me have a suffix on their mail id, e.g., tms2x, tms4g, etc.
	Obviously, the choice of format and a priori creation of unique
	mail/login ids is the most controversial part of our environment.
	There are advantages and disadvantages of this system which I won't
	go into unless someone is interested.

	- since all of these mail ids are unique, they can be considered to be
	"aliases" in the "virginia.edu" domain.  We support the notion of
	"registration" which creates a mapping between a person's unique
	mail id (in the virginia.edu domain) and the actual account and domain
	where that person reads his mail.  For example, all mail sent to
	"tms@virginia.edu" will be delivered to the place where I actually
	read my mail which is "tms@boole.acc.virginia.edu".  Thus, no one
	needs to know the details of exactly where I read my mail. Every system
	in our environment allows users to set/change their registration
	since it is done via a mail message to one of our main mail servers.
	Most systems wrap a shell script around this registration process
	so that it is very easy for the user to register or make changes.

	- the above "registration" process is very important for mail coming
	to users from networks that don't support domain names (e.g., Bitnet
	and UUCPnet) as well as to present one "name" for the University to
	the outside world.  In these cases, if a user is not registered, then
	our mail servers would not know where to actually deliver the person's
	mail *especially* since we want to present one "name" to the outside
	world (i.e., it shouldn't be necessary for anyone to know the full
	domain name in order to send mail to someone at the University, nor
	should they need to know the internal network configurations, etc.).
	For example, the University has one name/address on both Bitnet and
	UUCPnet.  We are "virginia" on both networks, so someone on Bitnet can
	send mail to me as "tms@virginia" without regard to the actual machine
	I use to read my mail, and likewise, someone on UUCPnet can send mail to
	me as "...!virginia!tms" without regard to the actual machine I use to
	read my mail.

	- we also support user-created aliases at the virginia.edu level.  If
	a user does not like their automatically created unique mail id or
	would simply prefer to have other aliases, then they can request the
	creation of such aliases.  For example, in addition to sending mail
	to me as "tms@virginia.edu", people can also send mail to me as
	"sigmon@virginia.edu" or "9240615@virginia.edu" (which is my phone
	number).  The only restriction that we place on these user-created
	aliases is that they (obviously) must be unique, can not conflict
	with the regular expressions that describe our automatically-generated
	ids (so that they don't preempt future automatically-generated ids),
	and that they be "reasonable" (e.g., we don't allow people to be
	MickeyMouse, nor GeorgeBush, etc.).

	- of course, none of the above prohibits users from having aliases
	in other domains.  The Academic Computing Center administers ids and
	aliases at the virginia.edu level for the entire University.  Departments
	are free to have their own aliases in their own domains (except that
	mail coming from non-domain-based networks can't access them for obvious
	reasons).

	- the University telephone book has a section that lists the electronic
	mail ids and aliases for all registered mail users at the University.
	In addition, we support a "whois" capability on many of our machines
	that allows users to interactively query our database to determine
	mail ids, phone numbers, department affiliations, etc.

	Hope this helps others establish university-wide mail networks.
	I'm happy to provide more detail or answer questions, just send
	me mail! 

11) Stanford University.  Contact Bob Morgan <morgan@jessica.stanford.edu>.

	Yes, assigning what we have come to call a "unique-id" to a
	campus-full of people is a tricky issue.  We have made a few
	abortive attempts at campus-wide mail delivery
	(unique-id@stanford.edu), but have run into the twin problems of
	a) choosing the unique-id, and b) letting people update their
	unique-id/actual-mailbox mapping without involving great piles of
	paper/bureaucracy/our time. 

	Right now we generate unique-ids for use with a phone-book-type
	service (based on Whois, RFC 912), using the following algorithm,
	moving down the list in case of name clash:

	1) first-initial/last-name (rmorgan), or
	2) first-initial/middle-initial/last-name (rlmorgan), or
	3) as-many-initials-as-necessary/last-name (rlfmorgan), or
	4) as-many-letters-of-first-name-as-necessary/last-name (robmorgan),
	or
	5) first-name/middle-initial/last-name (robertlmorgan), or
	6) 5) with digits as necessary appended (robertlmorgan3).

	(If you have a Unix "whois" client, you can bang our server with:
	  > whois -h argus.stanford.edu some-string)

	Looking through our 153 Smiths, I see no uses of rule #6, about 5
	cases where #5 was used, and several instances of repeated
	application of #4 (dasmith, davsmith, davismith, davidsmith).  I
	suspect that if we started using this for actual mail delivery
	(or, even more so, for Kerberos-style principal ids), some people
	would complain and insist on choosing their own.  The question
	then becomes, how do you decide what's reasonable?  If Joe
	Student wants to be known as "donaldkennedy" (SU's president), is
	that OK?  Part of the problem is that if these things are
	assigned immediately when people arrive (as they must be) then
	people will be stuck with something before they know what it's
	about (as with real names, I suppose). 

	No solutions, just more questions,

12) Dartmouth University.  Contact Steve Campbell
  <Steve.Campbell@dartmouth.edu>. 

	Dartmouth has a scheme much like the one you describe, called the
	Dartmouth Name Directory.  The DND is a database of about 13,000
	names with corresponding nicknames, password, paper-mail address,
	phone number, department (or undergraduate class), and e-mail
	address.  Mail addressed to Joe.Blow@dartmouth.edu goes to Joe
	Blow's preferred e-mail address, as it is recorded in the DND. 

	People are uniquely identified by the tokens in their name -- the
	name space is small enough that first name + middle initial +
	last name is unique in all but a very few cases.  Those people
	have their middle names entered also.  The names, nicknames, and
	departments are all lookup keys and partial matches are
	supported.  So you can mail to me with
	"James.W.Matthews@dartmouth" (my full name), "James W M"
	(abbrieviating the last name) or "Jim Matthews" (matching my last
	name and a nickname).  Only one token match must be exact. 

	If there are multiple matches a bounce message is generated,
	listing the matches (as long as there are fewer than fifteen or
	so).  So it is fairly easy to refine an address to the required
	precision. 

	The DND is seeded by the personnel and registration systems, and
	several fields (paper mail address, e-mail address, phone number,
	and nicknames) are user-maintained.  The default e-mail address
	is our paper mail system -- messages are printed out and hand
	delivered. 

13) "Track".

	I suggest you consider a software distribution to control the
	mail software.  See _1989 USENIX Software Mangement Workshop
	Proceedings_ for a good discussion or two or three on some
	software called Track. 

14) UCSD. Contact Brian Kantor <brian@ucsd.edu>.

	UCSD has such a mail system.  You may query it over the net to
	see what it looks like with the 'whois' command; try
		whois -h ucsd.edu smith
		whois -h ucsd.edu jsmith
	and variations along that line.

	The software to implement this is available in our anonymous FTP
	directory; take file pub/mailreg.tar.Z.  Caveat Emptor: the
	software is continuously being refined and is not documented. 
	
15) University of Kent at Canterbury.  Contact <sjl@ukc.ac.uk>.

	We operate an unified mail system for some 4000 staff and
	students. We use a centralised admin server which allocates a
	unique userid for each user.  In addition it will allocate a
	login (also unique) for the machines they require. 

	On top of the admin server we run a mail database system
	(original designed at Edinburgh University). The user interface
	to this is the mailhost program.  the user may nominate any
	machine he can log into as his mail machine. He does this by
	typing "mailhost -c". At night all the machines swap lists of
	users. Each entry in the list has a date stamp, if this stamp is
	later than the local machines recorded gate stamp for the user
	the entry is updated. 

	An extension to this is being worked on at the moment which
	allows psuedo domains. A group of workstations (typically) have a
	pseudo domain centered on their fileserver. This is still
	experimental. 

	The system has been in use for about three years now with no
	problems. 


16) UMCP.  Contact Mark Feldman <feldman@umd5.umd.edu>.

	   I know of a few places that do this sort of thing.  First,
	there's the UMAIL project here at UMCP.  One can send mail to
	various user names at host umail.umd.edu, and the mail gets
	delivered, even if the user doesn't have a computer account
	anywhere.  (In that case, mail gets printed out and sent via
	campus mail.) I am by no means fully up on the details of this
	system; you might talk to Mark Feldman (feldman@umd5.umd.edu) to
	get more information.  He may not be the right contact, but he
	can probably name the right people if asked. 

	...
	
Contact Steve <steve@umiacs.umd.edu>

	   Third, I'm implementing something that, while it doesn't do
	everything, does most of what you want (and which can be extended
	to do more).  Basically, it's an automatic method of generating a
	'global' mail address database, made from the union of the
	password and alias files for a particular department.  Getting
	all such files for a whole campus would be hard, but there's also
	a facility for getting just another department's global database
	and merging it in with others.  There's a concept of
	locality-of-reference, too; if I send mail to 'root', I get the
	UMIACS root, even though there's a 'root' in the CS Department's
	imported information.  This locality can be guaranteed either
	manually or automatically. 

	   If extended with some software to generate full-name addresses
	(First.MI.Last), and some software to handle duplicates
	differently, my code would probably do everything you need.  The
	only hitch is that the stuff I've written is not yet in extensive
	use (so I'd bet it has, er, misfeatures), and it's essentially
	undocumented.  If you want a copy of the code as it stands, I
	could provide one.  It's solid (it's even been Saberized), but
	it's definitely in need of some major cleaning up... 

17) ATT Private Mail Exchange.  Contact Rod Hart <hart@cp1.cp.bell-atl.com>.

	Look into the AT&T Private Mail Exchange System (PMX). My
	organization is in the process of installing one right now to
	solve a similar problem. We need x.400 in order to tie the
	various user groups (ie. DG, Proffs, Wang, PC, and of course
	Unix) together as well as document conversion. 
-=-

Norm Pritchett, The Ohio State University College of Engineering Network
Internet: pritchett@eng.ohio-state.edu	BITNET: TS1703 at OHSTVMA
UUCP: pritch@sydney.columbus.oh.us	CCNET: ENG::PRITCHETT (6172::PRITCHETT)