[alt.hackers] Sendmail hack for fullnames

karl@cheops.cis.ohio-state.edu (Karl Kleinpaste) (02/10/90)

I am not fond of the advertisement of loginnames as part of one's mail
address.  That is, "karl@cis.ohio-state.edu" really turns me off.  The
domain is fine, but the arbitrariness of the loginname disturbs me.  I
would rather be able to have a fullname out there, and hide the
frequently grotesque mapping of fullname to loginname.  This
grotesqueness can be especially true for very large sites, or academic
sites which try to summarize one's entire academic career in 8
characters or less.  (E.g., once upon a time, I logged in to certain
systems as ACSTRAQ, a reference to being in the college of Arts and
sciences, in Computer Science, using a TRaining account, which was
lexicographically item AQ [#17].  Worse, it wasn't really a "training
account" at all, but rather those were the accounts for people
employed by the computer science dept.  Icktooey.)

For example, we have a user population of ~2200 people here in OSU
CIS.  There are many fullnames whose mappings to loginnames get pretty
contorted.  Consider the Millers, all of whom have my sympathy:

ken		Ken F Miller		miller-j        Jeffrey J Miller
kmiller		Karen I Miller		miller-m        Mary E Miller   
mille-mc	Mark C Miller		miller-p        Paul S Miller   
miller		David Miller		miller-t        Thomas E Miller 
miller-c	Charles H Miller	mmiller         Michael J Miller
miller-d	Dale A Miller		rmiller         Randal L Miller 
miller-e	Eric J Miller		smiller         Suzanne L Miller

There's no logic for why Ken F Miller didn't end up as kmiller; or why
Michael J Miller ended up as mmiller when Mary E Miller got miller-m;
and there's certainly no justice in having abused Mark C Miller's name
into mille-mc.  There's not even any rhyme or reason why I'm "karl"
when there's a gent here by the name of Doug Karl who has been around
longer than I.  (He's "karl-d.")

I'd rather just advertise fullnames.  Some mailers support this
already: CMU CS mailers tend to advertise Full.Name@Place.CMU.EDU;
there's a couple of mail systems in and near UMich which support it.
But they do it with unusual mailer software that I'd rather not have
to support entirely on my own.  I'd rather do it with sendmail, since
it's The Standard Tool (nasty though it be, I freely admit).

I was not-quite-listening to a rather dull presentation at the recent
USENIX Conference when the inspiration hit me on how to do this.

The problem is that I want to create an alias database for all
usernames to provide guaranteed-unique mappings to fullnames.  I had
to get this alias database glued into sendmail somehow, and I abruptly
realized that I didn't have to hack sendmail source to provide a new
database access to accomplish this.  It's already got it:  The
nameserver interface, with the $[$] operators.

(Now don't go tossing your cookies all over the floor like that.  It's
not polite, y'know. :-)

I've created a new subdomain, name.cis.ohio-state.edu, and a companion
inv-name.cis.ohio-state.edu.  They consist of matched pairs of RRs,
one CNAME and one A RR per person known to the department.  For any
loginname with matching fullname, there are these entries in the zone
file:
	loginname	IN	CNAME	full_name
	full_name	IN	A	0.0.0.0
Invert the presence of loginname and fullname for the inv-name domain.
The use of an A RR for the fullname is merely a tag, to give the $[$]
operator an excuse to terminate.

Now, when S3 is busy canonicalizing things, near the end as it
concludes that this must be a local name, it executes two rules like
this (it actually occurs in two places; I'm working on reducing that
to one, but it's difficult):
	R$+<@$D>		$:$[$1.name.$D$]<@$D>	find full name
	R$+.name.$D<@$D>	$:$1<@$D>		get rid of domain name
The first canonicalizes the loginname to the fullname, with the whole
subdomain mess attached to it; e.g., at this point,
"karl<@cis.ohio-state.edu>" gets assaulted into the aliased form of
"karl_kleinpaste.name.cis.ohio-state.edu<@cis.ohio-state.edu>."
(Awesome, huh? :-)  Then the second strips off all that excess domain
nonsense, leaving it as "karl_kleinpaste<@cis.ohio-state.edu>."

This does appropriate things to both source and destination addresses,
of course.  This is fine until one realizes that one is about to hand
a fullname to /bin/mail for $#local delivery.  This is Bad.  But this
is why the corresponding inverted domain exists.  My S0 resolutions to
$#local mail now do it thus:
	R$+<@$D>		$@$>8$#local$:$1	local mail
where S8 is very short and almost identical to the previous rule pair,
except that it uses "inv-name" instead of "name," thus accomplishing
what I want while leaving the headers in the fullname format.
/bin/mail remains content.

A neat aspect of the mechanism is that all this excess $[$] usage is
just a bunch of NOPs in the absence of a defined name.what.ever
domain, or for a loginname which doesn't appear in name.what.ever.
The original loginname flows through unmolested.

The problem of how to generate the name and inv-name domains bothered
me for some time.  I solved this in the last couple of days using some
sed and awk work that performs some basic heuristics against
loginnames and GECOS information in /etc/passwd to give me what I
want.  It spits out "loginname<tab>full_name" sequences which I can
then awk into a zone file really easily.  Oh:  It ignores admin
accounts as well as UUCP and SLIP logins (they all begin with U and S,
respectively).

One problem which I found to be more significant than I anticipated
was the problem of fullname conflicts.  For example, a number of staff
people have two accounts, one of them for most work, another for
testing things in a more mundane environment.  But the GECOS fullnames
are the same.  In the same vein, there's a number of regular users who
have two accounts because of the way our account creation scheme
mis-interacts with class registration.  Also, there are (only very
occasionally) two different humans with the same name.  I solved this
by creating a little piece of code called deconflict.c, which takes as
argument a filename of conflicting fullnames, with the source data
coming on stdin and spitting out a modified data stream on stdout.
For each item whose fullname matches one of the conflict cases, the
first occurrence is spit out unmolested (you can use it as-is once,
after all); the second occurrence gets the first name trimmed to an
initial; and the third slices out a `_' or two.  If it gets to 4
conflicts on a single fullname, then something is more deeply in
error, so it complains on stderr and generates no output for that
line.

It's possible to generate new conflicts inadvertently via deconflict,
so the result of deconflict is considered again for conflicts.
Eventually, the loginname/fullname pairs pass muster for uniqueness on
both sides, and the result is fed to awk twice to generate name and
inv-name.

In 2200 usernames, I get about 40 fullname conflicts, only one of
which conflicts 3 times (no 4+ cases), and all resolve without
generating any new conflicts after the first pass.

I'm expecting to regenerate the name and inv-name domains at the
beginning of each quarter.  The stability required for mail receipt
will be preserved by not doing it more frequently than that.  And I'm
not using the usual /etc/passwd, but a YP passwd set which doesn't
allow changes via "passwd -s" and so forth.

I've been working on this haphazardly for a week or so now, and intend
to put it to live usage sometime next week.  I need to do some fairly
severe testing against it to make sure no failure cases remain.

Next task:  Hack inews to do similar resolutions so that we
consistently advertise only fullnames in all electronic
correspondence.  I'm running B 2.11.19, but this is clearly a case
where the shell script nature of C News' inews would win big.  A
couple of quickie host(1) queries and it'd be done...

Just another sendmail hacker,
--Karl
Personification of the Mailer Daemon
Ohio State Computer Science
karl_kleinpaste@cis.ohio-state.edu