[comp.mail.misc] More routing question information

fair@Apple.COM (Erik E. Fair) (12/31/90)

In the referenced article, oc@vmp.com (Orlan Cannon) writes:
>
>Most sites that run pathalias do so with just the UUCP maps as
>input.  However, pathalias is distributed with the tools to include
>all Internet sites as well.
>
>Here we feed a copy of HOSTS.TXT (from nic.ddn.mil) into arpatxt
>(supplied with the pathalias distribution) to create what we call
>"d.Internet".

arpatxt has been deprecated. It doesn't work correctly unless you
manually maintain a file of "arpa-privates" that declares dead a large
set of Internet sites that do NOT correspond with UUCP sites with the
same primary name. Bob Swan of U of Lowell was the last one doing the
maintenance of this file, and he quit after I posted the enclosed awk
script.

Try it. You'll like it. I do.

	Erik E. Fair	apple!fair	fair@apple.com

#!/bin/awk -f
# MKGLUE: UUCP map post processor
# Idea from Mel Pleasant via Eliot Lear
# Erik E. Fair <fair@apple.com>, August, 1988
#
# revised from domains.txt on December 31, 1990
#
# What we have here is a UUCP map postprocessor. To use:
#	pathalias uucpmaps > /tmp/paths.raw
#	mkglue /tmp/paths.raw > /tmp/glue
#	pathalias uucpmaps /tmp/glue > /tmp/paths.refined
#	do whatever you do with the maps here
#
# what this does is find Internet EQUIVALENCES for UUCP sites, e.g.
#
#	ucbvax=	ucbvax.berkeley.edu
#	apple= apple.com
#
# and then it reverses them, and puts all the domain names it finds into
# a completely connected network called "INTERNET", with COST defined
# below. That cost was determined experimentally on a Cray X/MP-48
# (pathalias will run on such a beast. It takes only 24 seconds to
# process all the maps and the glue file. It's amazing what you can do
# with a supercomputer). Your milage may vary.
#
# The effect of this is to cause nearly all your paths to take their
# first hop through the Internet. DO NOT USE THIS POSTPROCESSOR, unless
# you're actually on the Internet, or you have multiple UUCP neighbors
# who are on the Internet of equivalent call cost to you.
#
# This script will NOT do anything with domain gateway declarations, e.g.
#
#	foo	.bar.com
#
# because these do not provide a mapping between the Internet name and
# the UUCP name of the UUCP host involved. This script makes no
# distinction between "real" Internet hosts and "fake" (MX'd) ones (how
# can I? The information isn't there). Even with an MX host, someone on
# the Internet is accepting mail for them (that's what MX is all about).
#
# Encourage your Internet friends and neighbors to put all the right
# information into the UUCP maps.
#
# Also, your mailer must be able to transform thusly:
#
#	do.main!foo!bar!bazz -> foo!bar!bazz@do.main
#
# since that's what the database will generate. I do it with sendmail,
# and I installed the uunet hacks to 5.59 sendmail to look stuff up in a
# DBM database. I expect that the IDA sendmail stuff can be similarly
# coerced to do this.
#
# If nothing else, you might find the report at the end of the glue file
# interesting.
#
BEGIN{
	COST = "DEMAND+LOW";
#
	domain["arpa"] = 1;	domain["nato"] = 1;
	domain["com"] = 1;	domain["gov"] = 1;
	domain["mil"] = 1;	domain["org"] = 1;
	domain["edu"] = 1;	domain["net"] = 1;
	domain["int"] = 1;

	domain["ar"] = 1;
	domain["at"] = 1;
	domain["au"] = 1;
	domain["be"] = 1;
	domain["br"] = 1;
	domain["ca"] = 1;
	domain["ch"] = 1;
	domain["cl"] = 1;
	domain["cn"] = 1;
	domain["cr"] = 1;
	domain["cs"] = 1;
	domain["de"] = 1;
	domain["dk"] = 1;
	domain["eg"] = 1;
	domain["es"] = 1;
	domain["fi"] = 1;
	domain["fr"] = 1;
	domain["gr"] = 1;
	domain["hk"] = 1;
	domain["hu"] = 1;
	domain["ie"] = 1;
	domain["il"] = 1;
	domain["in"] = 1;
	domain["is"] = 1;
	domain["it"] = 1;
	domain["jp"] = 1;
	domain["kr"] = 1;
	domain["lk"] = 1;
	domain["mx"] = 1;
	domain["my"] = 1;
	domain["ni"] = 1;
	domain["nl"] = 1;
	domain["no"] = 1;
	domain["nz"] = 1;
	domain["ph"] = 1;
	domain["pl"] = 1;
	domain["pr"] = 1;
	domain["pt"] = 1;
	domain["se"] = 1;
	domain["sg"] = 1;
	domain["su"] = 1;
	domain["th"] = 1;
	domain["tr"] = 1;
	domain["tw"] = 1;
	domain["uk"] = 1;
	domain["us"] = 1;
	domain["uy"] = 1;
	domain["yu"] = 1;
	domain["za"] = 1;

	nbad = 0;
	imon_inet = 0;
}

# ignore domain gateways (no clean mapping - we must know the internet name)
/^\./ {next}

$2 == "%s" {
# hopefully only one of these
	if ( $1 !~ /\./ ) {
		localuucpname = $1;
		next;
	}
}

# here's the meat of the matter - find real domains and reverse the
# equivalences so that pathalias will give us paths with internet
# names in them.
$1 ~ /\./ {
	hostname= $1;
	curbad = 0;
# check top of domain name for validity
	i = split(hostname, parts, ".");
	top = parts[i];
	if (domain[top] != 1) {
		printf("# bad domain - %s\n", hostname);
		badtop[top]++;
		nbad++;
		curbad = 1;
	} else domtop[top]++;
	n = split($2, path, "!");
	if (n > 1) {
		uucpname= path[n - 1];
		if (hostname == uucpname)
			next;
# skip two sided dot aliases
		i = split(uucpname, parts, ".");
		if (i < 2) {
			if (! curbad) {
				print hostname "=" uucpname;
				internet[hostname]++;
			}
		} else if (domain[parts[i]] == 1) {
			print uucpname "=" hostname;
			internet[uucpname]++;
		}
	} else if ($2 == "%s") {
		if (imon_inet && localuucpname != "" && !curbad) {
			print localinetname "=" localuucpname;
			internet[localinetname]++;
		}
		if (!curbad) {
			localinetname= $1;
			internet[localinetname]++;
			imon_inet++
		}
	}
}

# now create a completely connected network of the domain names,
# with a low cost, so that we mostly use the Internet in preference
# to any other path
END{
	if (imon_inet) {
		print localinetname "=" localuucpname;
	}
	print "INTERNET={"
	for(hostname in internet) {
		printf("\t%s,\n", hostname);
	}
	printf("\t}(%s)\n", COST);
#
# report on what we found while perusing the map data
#
	printf("# top level domains\n");
	for(top in domtop) {
		printf("#\t%s\t%d\n", top, domtop[top]);
	}
#
	if (nbad > 0) {
		printf("\n# unrecognized summary:\n");
		for(dom in badtop) {
			printf("#\t%s\t%d\n", dom, badtop[dom]);
		}
	}
}