[comp.mail.uucp] A Bug in "Supersedes:" and some comments on map handling.

clewis@spectrix.UUCP (Chris Lewis) (12/17/87)

First, the bug:

The Supersedes header isn't working because someone is munging it.

We're getting headers in comp.mail.maps articles that look like:

> Path: spectrix!tmsoft!utgpu!water!watmath!clyde!cbosgd!ucbvax!rutgers!pleasant
> From: uucpmap@rutgers.rutgers.edu (UUCP Mapping Project)
> Newsgroups: comp.mail.maps
> Subject: UUCP map for d.usa.ct.1
> Message-ID: <6187@rutgers.rutgers.edu>
> Date: 13 Dec 87 17:49:04 GMT
> Expires: 27 Jan 88 17:49:02 GMT
> Sender: pleasant@rutgers.rutgers.edu
> Lines: 35
> Approved: pleasant@rutgers.rutgers.edu
> Supersedes: <6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile

Note the contents of the Supersedes header line:

	<6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile

Looking at inews.c and c_cancel()/control.c and hread I see that nobody 
strips off the stuff after the ">".  Thus, the whole string is considered 
to be a Message-ID.  Not surprisingly, inews drops this gem into the log file:

Dec 15 16:13	tmsoft	Can't cancel <6175@rutgers.rutgers.edu> can't \
	open /usr/lib/news/artfile:  non-existent

Which makes perfect sense once you parse it this way:

Dec 15 16:13	tmsoft	Can't cancel "<6175@rutgers.rutgers.edu> can't \
	open /usr/lib/news/artfile":  non-existent

Further, we get this in our history file:

<6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile	\
	12/15/87 16:13	cancelled

Which should be parsed thusly:

"<6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile"	\
	12/15/87 16:13	cancelled

Which blows the history file format and will cause all sorts of problems to 
the news code.  Particularly expire.

WHO'S DOING THIS?!  The only thing I know for sure is that the munging
isn't being done here.


Second, my comments on this mess:

Why on earth are the maps being updated this way?  If I've read some of
the commentary correctly, the Map Project is going to repost a whole
chunk of the map once an update to some entries in it occur (modulo some 
small number of days latency).  And, the Supersedes header is there
to allow a new chunk to "cancel" the previous chunk so that comp.mail.maps
doesn't take up so much room.  Eyuck.

The spool space problem is supposedly solved (BUT not here though, see above), 
but the transport costs will go thru the roof!  I seem to recall someone
saying updates will occur "hopefully within 48 to 72 hours".  I can just
see it - we'll get enough updated map chunks to vastly multiply the total
comp.mail.maps traffic.

First of all: why couldn't someone have reposted an up-to-date version
of uuhosts (or a much simpler map muncher that just unshars comp.mail.maps
postings).  Then: a site sets a short expiry time on comp.mail.maps and/or the
new version of uuhosts deletes the article after unpacking.  All the 
comp.mail.maps trickery regarding "Expires:" and "Supersedes:" would be 
TOTALLY unneccessary - because once unpacked, what the heck do you need
the article around for?  Secondly, even without unpacking, having the
articles in the spool area isn't terribly useful either (if you don't
bother unpacking them somehow, what use are they?)

Secondly: regarding transport load: Reposting humongous chunks of
map data simply because one entry had a comma misplaced is stupid.

The ideal brute force method would be to have map postings only contain
one site, and uuhosts unpacks it into a file of the same name as the site.  
Then, when a site changes it's map entry, you only have to repost that 
system's entry.

Obviously, this won't work very well - we don't have that many inodes....
rnews overhead would skyrocket....  Some site's names are too long...
System V would start saying "Directory too big - get help"...

Two possibilities:
	- How about having two types of maps posting: one a "whole chunk"
	  (ala "u.can.on.1") posted once per month to resynchronize everybody.
	  The other "patch" input to edit previously uuhost-unpacked chunks.
	  If you put the "patch" input in a separate newsgroup (ala:
	  comp.mail.maps.patches) you wouldn't even break existing
	  map munchers.  Sneakier still, just simply have the "patch"
	  articles have the invocation of patch in them - anybody running
	  uuhosts just simply has to copy "patch" into their MAPSH directory.
	  (Though, some thought as to security has to be given...)

	- Much better: release a utility to use in place of uuhosts that
	  maintains a database of sites and their map entries.  Then,
	  the keeper of the maps just posts articles which only contain
	  new entries.  The database munger just has to replace the already
	  existing database entry with the new ones.  uuhosts already does
	  half of this (the "Index" file).  Part of this utility would be
	  a mechanism by which the whole database can be dumped thru
	  pathalias (which we've also done to uuhosts).  In fact I've been
	  thinking about such a one and am going to try to build one.

Another possible problem has occured to me:

Since much of the map updating is decentralized (eg: Canadian maps are
done at U of Toronto), won't there be a problem with the area coordinator
trying to Supersede an article posted by Rutgers?  c_cancel() won't like it.

Another inconvenience (and a kludge) - at least until recently our area 
coordinator was kindly posting updated copies of the u.can.on into 
comp.mail.maps with distribution "can".  Quite frequently at one point.  
Problem is that in B news you have problems trying to pump comp.mail.maps 
entries thru the map muncher when the distribution matches a top-level 
newsgroup.  C-news does not have this problem because you can distinquish 
between distribution and newsgroup in the sys file.

Eg: we used to have this sys entry:

maps:world,comp.mail.maps:F:.....Batch		(for uuhosts)

Without "world", maps doesn't see anything.

This didn't catch the "can" distribution postings, so I tried:

maps:world,can,ont,tor,comp.mail.maps:....

Silly me!  uuhosts bitched at me about all of the "not a map postings" - 
obviously, uuhosts was being given ALL of the local newsgroup articles.

So, what I did was hack ifuncs.c's function broadcast (ifuncs.c at patch
level 13):



		if (!ngmatch(h.nbuf, srec.s_nbuf))
			continue;
#define	COMPMAILMAPSHACK	/* START CRL 87:12:2 */
#ifdef	COMPMAILMAPSHACK
		if (STRCMP(h.nbuf, "comp.mail.maps") == 0 &&
		    STRCMP(srec.s_name, "maps") == 0)/* must match sys entry */
		    dist = "world";
#endif /* END CRL 87:12:2 */
		if (*dist == '\0')
			dist = "world";
		if (!ngmatch(dist, srec.s_nbuf) && !ngmatch(srec.s_nbuf, dist))
			    continue;

(sorry, we don't have diff -c).

What this does is change an internal copy of the distribution field to
world if the newsgroup is comp.mail.maps and the system name is "maps" so
that it'll go to the maps site no matter what the distribution is.
This doesn't seem to affect other site or newsgroup routing.

And finally, somehow the last u.can.on posting didn't have the updates I had
sent to our area coordinator which had made it into his local postings that
had occured long before the comp.mail.maps flood from rutgers.  I've
sent off another copy.  I wonder though, how many other entries have been
lost like this?
-- 
Chris Lewis, Spectrix Microsystems Inc,
UUCP: {uunet!mnetor, utcsri!utzoo, lsuc}!spectrix!clewis
[Also: lsuc!clewis in a pinch]
Phone: (416)-474-1955

heiby@mcdchg.UUCP (Ron Heiby) (12/19/87)

Chris Lewis (clewis@spectrix.UUCP) writes:
> Secondly, even without unpacking, having the
> articles in the spool area isn't terribly useful either (if you don't
> bother unpacking them somehow, what use are they?)

The main thing the articles are useful for in the spool area is for being
sent on to down-stream sites.  If you delete them as soon as they come in
and are un-packed, then they aren't around for "sendbatch" to find.

Chris makes some interesting observations in his article.  I suspect
that with some kind of "common sense" test, the map updating could be
handled a bit more reasonably than the postings I've seen imply it is.
Changes in the entries of some sites have more wide-reaching effects than
changes in the entries of others.  Some changes have little effect on things,
like changing the value of a connection between DEMAND and WEEKLY.  But,
changing a connection from DEMAND to DEAD may have a much greater effect.
It is probably reasonable for map updates to be held until a "large effect"
change or some period of time has gone by.  Also, in the "monthly posting",
it probably is not necessary to post articles that were updated in the
last N days, where N is probably something like 10-20.

I am concerned about traffic volume.  I am also concerned about having
my mail routing be as accurate as is practical.  Perhaps the map coordinators
could say a bit more about their thinking on the trade-offs involved, and
their "philosophy".
-- 
Ron Heiby, heiby@mcdchg.UUCP	Moderator: comp.newprod & comp.unix
"Intel architectures build character."