karl@cheops.cis.ohio-state.edu (Karl Kleinpaste) (02/10/90)
I am not fond of the advertisement of loginnames as part of one's mail address. That is, "karl@cis.ohio-state.edu" really turns me off. The domain is fine, but the arbitrariness of the loginname disturbs me. I would rather be able to have a fullname out there, and hide the frequently grotesque mapping of fullname to loginname. This grotesqueness can be especially true for very large sites, or academic sites which try to summarize one's entire academic career in 8 characters or less. (E.g., once upon a time, I logged in to certain systems as ACSTRAQ, a reference to being in the college of Arts and sciences, in Computer Science, using a TRaining account, which was lexicographically item AQ [#17]. Worse, it wasn't really a "training account" at all, but rather those were the accounts for people employed by the computer science dept. Icktooey.) For example, we have a user population of ~2200 people here in OSU CIS. There are many fullnames whose mappings to loginnames get pretty contorted. Consider the Millers, all of whom have my sympathy: ken Ken F Miller miller-j Jeffrey J Miller kmiller Karen I Miller miller-m Mary E Miller mille-mc Mark C Miller miller-p Paul S Miller miller David Miller miller-t Thomas E Miller miller-c Charles H Miller mmiller Michael J Miller miller-d Dale A Miller rmiller Randal L Miller miller-e Eric J Miller smiller Suzanne L Miller There's no logic for why Ken F Miller didn't end up as kmiller; or why Michael J Miller ended up as mmiller when Mary E Miller got miller-m; and there's certainly no justice in having abused Mark C Miller's name into mille-mc. There's not even any rhyme or reason why I'm "karl" when there's a gent here by the name of Doug Karl who has been around longer than I. (He's "karl-d.") I'd rather just advertise fullnames. Some mailers support this already: CMU CS mailers tend to advertise Full.Name@Place.CMU.EDU; there's a couple of mail systems in and near UMich which support it. But they do it with unusual mailer software that I'd rather not have to support entirely on my own. I'd rather do it with sendmail, since it's The Standard Tool (nasty though it be, I freely admit). I was not-quite-listening to a rather dull presentation at the recent USENIX Conference when the inspiration hit me on how to do this. The problem is that I want to create an alias database for all usernames to provide guaranteed-unique mappings to fullnames. I had to get this alias database glued into sendmail somehow, and I abruptly realized that I didn't have to hack sendmail source to provide a new database access to accomplish this. It's already got it: The nameserver interface, with the $[$] operators. (Now don't go tossing your cookies all over the floor like that. It's not polite, y'know. :-) I've created a new subdomain, name.cis.ohio-state.edu, and a companion inv-name.cis.ohio-state.edu. They consist of matched pairs of RRs, one CNAME and one A RR per person known to the department. For any loginname with matching fullname, there are these entries in the zone file: loginname IN CNAME full_name full_name IN A 0.0.0.0 Invert the presence of loginname and fullname for the inv-name domain. The use of an A RR for the fullname is merely a tag, to give the $[$] operator an excuse to terminate. Now, when S3 is busy canonicalizing things, near the end as it concludes that this must be a local name, it executes two rules like this (it actually occurs in two places; I'm working on reducing that to one, but it's difficult): R$+<@$D> $:$[$1.name.$D$]<@$D> find full name R$+.name.$D<@$D> $:$1<@$D> get rid of domain name The first canonicalizes the loginname to the fullname, with the whole subdomain mess attached to it; e.g., at this point, "karl<@cis.ohio-state.edu>" gets assaulted into the aliased form of "karl_kleinpaste.name.cis.ohio-state.edu<@cis.ohio-state.edu>." (Awesome, huh? :-) Then the second strips off all that excess domain nonsense, leaving it as "karl_kleinpaste<@cis.ohio-state.edu>." This does appropriate things to both source and destination addresses, of course. This is fine until one realizes that one is about to hand a fullname to /bin/mail for $#local delivery. This is Bad. But this is why the corresponding inverted domain exists. My S0 resolutions to $#local mail now do it thus: R$+<@$D> $@$>8$#local$:$1 local mail where S8 is very short and almost identical to the previous rule pair, except that it uses "inv-name" instead of "name," thus accomplishing what I want while leaving the headers in the fullname format. /bin/mail remains content. A neat aspect of the mechanism is that all this excess $[$] usage is just a bunch of NOPs in the absence of a defined name.what.ever domain, or for a loginname which doesn't appear in name.what.ever. The original loginname flows through unmolested. The problem of how to generate the name and inv-name domains bothered me for some time. I solved this in the last couple of days using some sed and awk work that performs some basic heuristics against loginnames and GECOS information in /etc/passwd to give me what I want. It spits out "loginname<tab>full_name" sequences which I can then awk into a zone file really easily. Oh: It ignores admin accounts as well as UUCP and SLIP logins (they all begin with U and S, respectively). One problem which I found to be more significant than I anticipated was the problem of fullname conflicts. For example, a number of staff people have two accounts, one of them for most work, another for testing things in a more mundane environment. But the GECOS fullnames are the same. In the same vein, there's a number of regular users who have two accounts because of the way our account creation scheme mis-interacts with class registration. Also, there are (only very occasionally) two different humans with the same name. I solved this by creating a little piece of code called deconflict.c, which takes as argument a filename of conflicting fullnames, with the source data coming on stdin and spitting out a modified data stream on stdout. For each item whose fullname matches one of the conflict cases, the first occurrence is spit out unmolested (you can use it as-is once, after all); the second occurrence gets the first name trimmed to an initial; and the third slices out a `_' or two. If it gets to 4 conflicts on a single fullname, then something is more deeply in error, so it complains on stderr and generates no output for that line. It's possible to generate new conflicts inadvertently via deconflict, so the result of deconflict is considered again for conflicts. Eventually, the loginname/fullname pairs pass muster for uniqueness on both sides, and the result is fed to awk twice to generate name and inv-name. In 2200 usernames, I get about 40 fullname conflicts, only one of which conflicts 3 times (no 4+ cases), and all resolve without generating any new conflicts after the first pass. I'm expecting to regenerate the name and inv-name domains at the beginning of each quarter. The stability required for mail receipt will be preserved by not doing it more frequently than that. And I'm not using the usual /etc/passwd, but a YP passwd set which doesn't allow changes via "passwd -s" and so forth. I've been working on this haphazardly for a week or so now, and intend to put it to live usage sometime next week. I need to do some fairly severe testing against it to make sure no failure cases remain. Next task: Hack inews to do similar resolutions so that we consistently advertise only fullnames in all electronic correspondence. I'm running B 2.11.19, but this is clearly a case where the shell script nature of C News' inews would win big. A couple of quickie host(1) queries and it'd be done... Just another sendmail hacker, --Karl Personification of the Mailer Daemon Ohio State Computer Science karl_kleinpaste@cis.ohio-state.edu