[comp.mail.headers] Humongous pathalias database

matt@ncr-sd.SanDiego.NCR.COM (Matt Costello) (04/13/88)

In article <472@splut.UUCP> jay@splut.UUCP (Jay Maynard) writes:
>In article <4634@chinet.UUCP> les@chinet.UUCP (Leslie Mikesell) writes:
>>  The machines are small and fairly loaded as is; I don't want each of
>>them to have to store the names of all the machines in the known universe
>>or to search such a table every time mail is sent.
>
>This is the reason I haven't installed smail on splut yet. I don't want
>to maintain a humongous database (if my 286 machine could even handle
>it, and one neighbor indicates it can't), and I have not one but two
>well-connected neighbors and one fairly-well-connected neighbor that I
>haven't figured out how to use to their fullest (i.e. I want stuff that
>can go through ihnp4 to the destination machine but not through uunet
>easily to go via tness1, while I want stuff that can go directly through
>uunet to go via nuchat...)

Collect the USENET maps on a single machine and then run pathalias on that
machine to generate the path files for all the local little machines that
need it.  This will save the 2.1MB that the map files require on all but
one machine.

Of course the pathalias output file can be rather large; ncr-sd's is 485KB.
There are ways of making the output smaller.  The easiest is to just throw
away all redundant information.  I wrote a program named pathprune a year
back that does just this.  Pathprune takes a sorted path file and throws out
all lines that are unnecessary when using a smart mailer:

	           orig	      ncr-sd   %      scubed   %
domain gateways     598        398  0.67       398  0.67
qualified hosts    1943        564  0.29       564  0.27
uucp hosts         9598       9598  1.00      6440  0.67

size of file (KB)   485        403  0.83       261  0.54

There are three kinds of lines in the pathalias output and the statistics
are broken up into the number of each.  Needless to say, the simple uucp
hostname predominates.  The "orig" column lists the breakdown for the
raw pathalias file; the "ncr-sd" and "scubed" columns list the number of
lines left after running the file through pathprune with that host as
the gateway for .uucp.  Scubed has the best path for 1/3 of the uucp
hostnames so specifying it as the .uucp gateway allows 1/3 of the
uucp hostnames to be dropped.
-- 
Matt Costello	<matt.costello@SanDiego.NCR.COM>
+1 619 485 2926	<matt.costello%SanDiego.NCR.COM@Relay.CS.NET>
		{ucsd,cbosgd,pyramid,nosc.ARPA}!ncr-sd!matt