clewis@spectrix.UUCP (Chris Lewis) (12/17/87)
First, the bug: The Supersedes header isn't working because someone is munging it. We're getting headers in comp.mail.maps articles that look like: > Path: spectrix!tmsoft!utgpu!water!watmath!clyde!cbosgd!ucbvax!rutgers!pleasant > From: uucpmap@rutgers.rutgers.edu (UUCP Mapping Project) > Newsgroups: comp.mail.maps > Subject: UUCP map for d.usa.ct.1 > Message-ID: <6187@rutgers.rutgers.edu> > Date: 13 Dec 87 17:49:04 GMT > Expires: 27 Jan 88 17:49:02 GMT > Sender: pleasant@rutgers.rutgers.edu > Lines: 35 > Approved: pleasant@rutgers.rutgers.edu > Supersedes: <6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile Note the contents of the Supersedes header line: <6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile Looking at inews.c and c_cancel()/control.c and hread I see that nobody strips off the stuff after the ">". Thus, the whole string is considered to be a Message-ID. Not surprisingly, inews drops this gem into the log file: Dec 15 16:13 tmsoft Can't cancel <6175@rutgers.rutgers.edu> can't \ open /usr/lib/news/artfile: non-existent Which makes perfect sense once you parse it this way: Dec 15 16:13 tmsoft Can't cancel "<6175@rutgers.rutgers.edu> can't \ open /usr/lib/news/artfile": non-existent Further, we get this in our history file: <6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile \ 12/15/87 16:13 cancelled Which should be parsed thusly: "<6175@rutgers.rutgers.edu> can't open /usr/lib/news/artfile" \ 12/15/87 16:13 cancelled Which blows the history file format and will cause all sorts of problems to the news code. Particularly expire. WHO'S DOING THIS?! The only thing I know for sure is that the munging isn't being done here. Second, my comments on this mess: Why on earth are the maps being updated this way? If I've read some of the commentary correctly, the Map Project is going to repost a whole chunk of the map once an update to some entries in it occur (modulo some small number of days latency). And, the Supersedes header is there to allow a new chunk to "cancel" the previous chunk so that comp.mail.maps doesn't take up so much room. Eyuck. The spool space problem is supposedly solved (BUT not here though, see above), but the transport costs will go thru the roof! I seem to recall someone saying updates will occur "hopefully within 48 to 72 hours". I can just see it - we'll get enough updated map chunks to vastly multiply the total comp.mail.maps traffic. First of all: why couldn't someone have reposted an up-to-date version of uuhosts (or a much simpler map muncher that just unshars comp.mail.maps postings). Then: a site sets a short expiry time on comp.mail.maps and/or the new version of uuhosts deletes the article after unpacking. All the comp.mail.maps trickery regarding "Expires:" and "Supersedes:" would be TOTALLY unneccessary - because once unpacked, what the heck do you need the article around for? Secondly, even without unpacking, having the articles in the spool area isn't terribly useful either (if you don't bother unpacking them somehow, what use are they?) Secondly: regarding transport load: Reposting humongous chunks of map data simply because one entry had a comma misplaced is stupid. The ideal brute force method would be to have map postings only contain one site, and uuhosts unpacks it into a file of the same name as the site. Then, when a site changes it's map entry, you only have to repost that system's entry. Obviously, this won't work very well - we don't have that many inodes.... rnews overhead would skyrocket.... Some site's names are too long... System V would start saying "Directory too big - get help"... Two possibilities: - How about having two types of maps posting: one a "whole chunk" (ala "u.can.on.1") posted once per month to resynchronize everybody. The other "patch" input to edit previously uuhost-unpacked chunks. If you put the "patch" input in a separate newsgroup (ala: comp.mail.maps.patches) you wouldn't even break existing map munchers. Sneakier still, just simply have the "patch" articles have the invocation of patch in them - anybody running uuhosts just simply has to copy "patch" into their MAPSH directory. (Though, some thought as to security has to be given...) - Much better: release a utility to use in place of uuhosts that maintains a database of sites and their map entries. Then, the keeper of the maps just posts articles which only contain new entries. The database munger just has to replace the already existing database entry with the new ones. uuhosts already does half of this (the "Index" file). Part of this utility would be a mechanism by which the whole database can be dumped thru pathalias (which we've also done to uuhosts). In fact I've been thinking about such a one and am going to try to build one. Another possible problem has occured to me: Since much of the map updating is decentralized (eg: Canadian maps are done at U of Toronto), won't there be a problem with the area coordinator trying to Supersede an article posted by Rutgers? c_cancel() won't like it. Another inconvenience (and a kludge) - at least until recently our area coordinator was kindly posting updated copies of the u.can.on into comp.mail.maps with distribution "can". Quite frequently at one point. Problem is that in B news you have problems trying to pump comp.mail.maps entries thru the map muncher when the distribution matches a top-level newsgroup. C-news does not have this problem because you can distinquish between distribution and newsgroup in the sys file. Eg: we used to have this sys entry: maps:world,comp.mail.maps:F:.....Batch (for uuhosts) Without "world", maps doesn't see anything. This didn't catch the "can" distribution postings, so I tried: maps:world,can,ont,tor,comp.mail.maps:.... Silly me! uuhosts bitched at me about all of the "not a map postings" - obviously, uuhosts was being given ALL of the local newsgroup articles. So, what I did was hack ifuncs.c's function broadcast (ifuncs.c at patch level 13): if (!ngmatch(h.nbuf, srec.s_nbuf)) continue; #define COMPMAILMAPSHACK /* START CRL 87:12:2 */ #ifdef COMPMAILMAPSHACK if (STRCMP(h.nbuf, "comp.mail.maps") == 0 && STRCMP(srec.s_name, "maps") == 0)/* must match sys entry */ dist = "world"; #endif /* END CRL 87:12:2 */ if (*dist == '\0') dist = "world"; if (!ngmatch(dist, srec.s_nbuf) && !ngmatch(srec.s_nbuf, dist)) continue; (sorry, we don't have diff -c). What this does is change an internal copy of the distribution field to world if the newsgroup is comp.mail.maps and the system name is "maps" so that it'll go to the maps site no matter what the distribution is. This doesn't seem to affect other site or newsgroup routing. And finally, somehow the last u.can.on posting didn't have the updates I had sent to our area coordinator which had made it into his local postings that had occured long before the comp.mail.maps flood from rutgers. I've sent off another copy. I wonder though, how many other entries have been lost like this? -- Chris Lewis, Spectrix Microsystems Inc, UUCP: {uunet!mnetor, utcsri!utzoo, lsuc}!spectrix!clewis [Also: lsuc!clewis in a pinch] Phone: (416)-474-1955
heiby@mcdchg.UUCP (Ron Heiby) (12/19/87)
Chris Lewis (clewis@spectrix.UUCP) writes: > Secondly, even without unpacking, having the > articles in the spool area isn't terribly useful either (if you don't > bother unpacking them somehow, what use are they?) The main thing the articles are useful for in the spool area is for being sent on to down-stream sites. If you delete them as soon as they come in and are un-packed, then they aren't around for "sendbatch" to find. Chris makes some interesting observations in his article. I suspect that with some kind of "common sense" test, the map updating could be handled a bit more reasonably than the postings I've seen imply it is. Changes in the entries of some sites have more wide-reaching effects than changes in the entries of others. Some changes have little effect on things, like changing the value of a connection between DEMAND and WEEKLY. But, changing a connection from DEMAND to DEAD may have a much greater effect. It is probably reasonable for map updates to be held until a "large effect" change or some period of time has gone by. Also, in the "monthly posting", it probably is not necessary to post articles that were updated in the last N days, where N is probably something like 10-20. I am concerned about traffic volume. I am also concerned about having my mail routing be as accurate as is practical. Perhaps the map coordinators could say a bit more about their thinking on the trade-offs involved, and their "philosophy". -- Ron Heiby, heiby@mcdchg.UUCP Moderator: comp.newprod & comp.unix "Intel architectures build character."