xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (05/30/91)
[It is incredible to me that the supposedly professional system administrators of the net are incapable of maintaining a meaningful subject line in long discussions. With such an example ("Really funny jokes being missed" for a news.admin discussion about email warnings on detecting header problems on articles from non-local sites), what hope for the casual news users to learn good habits?] The net discussion about dropping articles without warning to the originator provoked a local email flurry on the subject, from which one potentially useful idea emerged. The demurrer has been put forth that automatically mailing warnings back to the originator when news articles are dropped would cause a "flood" of email, more damaging to the site of origin than having articles originating there silently dropped. I think this is just an excuse for not thinking the problem through to a workable solution. I propose the following solution for folks to shoot full of holes or refine to usability, as the mood strikes. There are, I am told, about 1000 Cnews sites right now, a minority of the 10,000 odd sites in the mail maps, but probably a majority of the larger sites important to net connectivity, so the following proposal would take some parameter tuning. Instead of mailing a warning back each time an article is dropped, each site should roll a random number generator, with a small chance of mailing a warning, and a large chance of dropping the article silently. Each type of news software should consider itself the whole net for purposes of determining what the right fraction of warnings should be, since the choice of articles dropped _may_ be independent for each type. For 1000 Cnews sites, a 1/500 or 1/1000 chance of sending a warning would mean (if the article actually propagated to all but the Cnews sites through successfully sneaking past the filters of the other news software (Bnews, VMSNEWS(?), Waffle, etc., and then got passed to each Cnews site and there dropped) that two or one warnings on average would be mailed back _from the whole net_ per article with mangled header detected by the particular filter in Cnews. If it were also detected by five other kinds of news software, each also mailing one or two warnings, at worst a dozen or so letters would arrive, not an overwhelming burden In the more usual case where Cnews dropping the articles from a site mostly destroys connectivity, and the rest of the net never sees the article, the feedback would be slower, but eventually would happen, still providing some level of warning. This is the case for which tuning of the warning fraction should be considered; better to provide too much (but not an overwhelming level of) warning, than too little, to see problems discovered and fixed sooner. See also two paragraphs below. In the nasty case where a site mangles headers in carload lots, and dumps a multi-megabyte slug of old news on the net, a lot of extra mail would go back to article authors (or preferably, "usenet" or "postmaster" or "root" at the same site), but not the thousandfold multiplier that an unrandomized warning generator would provoke. Better yet, by the sampling that would occur across the net, lots of site sysadmins would quickly have in hand sufficient widely distributed looks at paths to the problem to narrow down to a single site creating the mangled headers and get the problem cut off early, as opposed to the current two or three day lag while enough postings reach news.admin to do the trick, and the modem at the offending site keeps dumping old news in massive amounts onto the rest of the net. The case of a leaf site feeding directly into a site potentially dropping articles should probably have a _much_ higher (or unity) chance of a warning being mailed back, but it would be little extra software effort to make the randomizing fraction separate and tunable for each feed a site maintains, so that generic incoming netwide newsfeeds get low probability of warning per article (lots of other sites get a chance to mail a warning, too) while client sites with no other chance of warning get a high probability of warning, since only the feed site will ever see the offending article. This proposal cuts down the email to a fraction of "warn about each infraction at each site where detected" rates, while providing a fair level of feedback to each site generating bad headers. Complaints, problems, improvements? Kent, the man from xanth. <xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>
herman@corpane.uucp (Harry Herman) (06/16/91)
In <1991Jun10.225052.19739@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >In article <34587746@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes: >>... An unparsable date (which >>might something as innocuous as a new time zone) does not intrinsically >>cripple the article, as a bad Message-ID does for instance.... >Unfortunately, it does, because it might be a garbled version of a stale >date. A parsable date is a non-negotiable requirement. (Incidentally, >a new timezone abbreviation does not make the date unparsable, if I'm >remembering the code correctly.) >-- >"We're thinking about upgrading from | Henry Spencer @ U of Toronto Zoology >SunOS 4.1.1 to SunOS 3.5." | henry@zoo.toronto.edu utzoo!henry Since dates are important, then have C-News recogonize the date formats that really exist on the net, not according to what a piece of paper says they should look like. Work with the writers of other news systems to get them to write new code to match that piece of paper. Then, when 99.9% of the news postings match the peice of paper, then CONSIDER dropping the "obsolete" support. Although it would not really be obsolete until there are 0 posts with the old format. One of the recent releases of the nn news reader claims to have changed its code that splits digests into separate articles to handle both the "standard" format, and a particular news group's "non-standard" digest format. The writer of nn did not say "Group x's digests are incorrect so we will ignore it", the writer essentially said "we will support what is actually out there". I am a user of news, I am not a system administrator or a news administrator, so I have zero choice in what operating system we use or what news transport, and am highly offended at an earlier posting that was along the lines of use Unix and C-News if you want to use news. There are other operating systems out there, and there are other news transports out there. If we are truly going to have an international network for sharing information, then all the people writing the software that makes this happen have to decide to work together and work with what is out there. Add new features, but don't break existing features with a we are right and the rest of the world is wrong attitude. Harry Herman herman@corpane
moraes@cs.toronto.edu (Mark Moraes) (06/18/91)
herman@corpane.uucp (Harry Herman) writes: >Then, when 99.9% of the news postings... Last we checked, the C News date parser almost achieved this number. Specifically, from the tests Geoff and I ran in March, courtesy Dave Lawrence @rpi and Dave Alden @zaphod: > Of the 102385 date headers in articles on rpi.edu, our RFC822/1123 > date parser parses all but 142 correctly. (0.14%) [Our general date > parser gets all but 5] > > Of the 158673 date headers on zaphod, the RFC date parser didn't like > 194. (0.12%) Again, the general parser dislikes 5. (one different from > rpi) These numbers are obviously skewed by the fact that B News sites rewrite headers, so there was clearly a race on to see whether the articles got to zaphod or rpi without touching a B News site. I can understand how the 0.15% of the "victims" feel; we did try to warn people but there were limits in how much effort we were willing to put into it. You know what they say about the last 1% :-) Mark.
billd@fps.com (Bill Davidson) (06/18/91)
In article <34587746@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes: >An unparsable date (which might something as innocuous as a new time >zone) does not intrinsically cripple the article, as a bad Message-ID >does for instance.... >In <1991Jun10.225052.19739@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >Unfortunately, it does, because it might be a garbled version of a stale >date. A parsable date is a non-negotiable requirement. (Incidentally, >a new timezone abbreviation does not make the date unparsable, if I'm >remembering the code correctly.) In article <1991Jun16.154834.114@corpane.uucp> herman@corpane.uucp (Harry Herman) writes: >Since dates are important, then have C-News recogonize the date formats >that really exist on the net, not according to what a piece of paper says >they should look like. I disagree but that's irrelevant. I saw something that over 50% of college students cheat in some way. Does that make it acceptable just because it's common practice? >Work with the writers of other news systems to get them to write new code >to match that piece of paper. Then, when 99.9% of the news postings >match the peice of paper, then CONSIDER dropping the "obsolete" support. >Although it would not really be obsolete until there are 0 posts with the >old format. Guess what Harry? It's already happened! Yes it really has. The vast extreme overwhelming majority of sites are Bnews and Cnews and guess what? They generate proper dates. In general, on these systems, the date is generated by inews. >One of the recent releases of the nn news reader claims to have changed its >code that splits digests into separate articles to handle both the "standard" >format, and a particular news group's "non-standard" digest format. The >writer of nn did not say "Group x's digests are incorrect so we will >ignore it", the writer essentially said "we will support what is actually >out there". That's a reader; not transport software. Readers are meant to please people. Transport software is not for people. Its purpose is to transport the software as efficiently and reliably as possible. Bad headers, and kludges to deal with them jeapordize both. Readers are supposed to make sure that things get from the transport software to the users and vice-versa. If your reader does that job poorly that's not my problem. >I am a user of news, I am not a system administrator or a news administrator, >so I have zero choice in what operating system we use or what news transport, >and am highly offended at an earlier posting that was along the lines of >use Unix and C-News if you want to use news. There are other operating >systems out there, and there are other news transports out there. If we >are truly going to have an international network for sharing information, >then all the people writing the software that makes this happen have to >decide to work together and work with what is out there. Add new features, >but don't break existing features with a we are right and the rest of >the world is wrong attitude. People not running news on something UNIXish make up an incredibly small minority of the news sites. The RFC's have been around quite a while now. There's been lots of time for the admins to fix it. It is possible for a normal user to bitch at the admin to fix things. Most admins will let a lot of things slide until someone complains. The vast majority of the net should not be punished because people running news on a few Atari ST's are having their articles dropped. The people whose articles are getting dropped are at a very small minority of sites (probably less than 1000). --Bill Davidson
xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (06/18/91)
tp@mccall.com (Terry Poot) writes: > xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes: [about broadcasting notice of dropped articles:] >> Too much overhead; you are targeting a message at >> _all_ Usenet sites to inform _one_ site. I really >> can't see the attraction of that at all. > Do you have the same problem with Cancel control > messages, which exhibit the same behavior? Huh? A cancel goes to all sites because it _must_ go to all sites to be effective. Notice of a dropped article needs to get to only one site, the site of origin, to be effective. Not at all comparable situations. > The error reports are much more useful to the net > as a whole than cancels, and after the initial > rash of these when the facility is first > introduced, there should be fewer of them than > cancels flowing around the net. (And as for the > initial rash of errors, there are proposals out > there that reduce this problem to a reasonable > amount as well.) Not at all true if a month of news gets barfed back onto the net by a mechanism that looks like a bad header problem and evokes this mechanism you defend; at that point, if the Bnews site connectivity from the failing site is enough to get the barf out to the "whole net", the deluge of messages this mechanism you defend would dump on the net would disable the whole net for a significant period of time, like weeks, while sysadmins cleared up the junk while trying to preserve the news and mail. The cost in phone charges alone would be horrendous. I've yet to see a modification to the broadcast notification proposal that addresses this _fatal_ flaw. News barfs with mangled headers happen several times a year; a mechanism that could take the whole net down several times a year makes the internet worm look like a love pat; it only nailed 6000 machines, and that by accident; the broadcast proposal risks the whole ball of wax, and deliberately. >>> Regular messages in a specific group seem much >>> more reliable. >> Worse and worse; we are right back to the problem >> of the original posting warning that a change was >> coming; if you put information where it won't be >> seen, you might as well not go to the trouble. >> There are _two_ problems to be solved here: >> 1) A technical problem: how to get the >> information back to the author or the site admin >> at the site where failing messages are being >> created, without > Solved by the error report proposal, if you > consider the WHOLE proposal. Nope, I've considered the whole proposal; a proposal that breaks the net doesn't solve problems. >> breaking anything further (Cnews has already gone >> through a phase of reliably recycling old news, >> and now one of reliably discarding without notice >> articles with what used to be working article >> headers; no sense trying for three disasters in a >> row), > Are you trying to tell us that news is too > unreliable a transport to carry notices of its own > problems? First, it isn't much of a software engineering win to depend on a failing mechanism to carry notice of its own failure. Second, designing a mechanism that breaks it worse in the process of "fixing" things is no win at all. > Do you have a serial line from each of your > ethernet controllers tying back into your cpu for > error reports? :-) Ah to have a job, where such frills as multiple communicating computers would be accessible. >> such as drowning the net in excess messages or > I think you over-estimate. Most of the net runs > good software, or I doubt even the C news crowd > would have seen the changes that sparked all this > as a doable thing. Most of the net runs _abyssmal_ software [a lot of it hides under the name of "email", which has a blind 30% first reply failure rate for naive users]; news failures are relatively public, so squeeking wheels get attention. Still, the UUCP transport suite has a crash and burn failure mode in uucico, news spools overflow regularly all over the net, sites can't agree on the legality of posting to groups not carried at the site, header lines needed for some news reading software aren't maintained by other news posting software, subject lines get truncated, the list goes on. I sure wouldn't brag on the workability of news anywhere I had a reputation to maintain. Ignoring problems this bad in software doesn't say much for ones professionalism. >> the site of origin in excess email. Only the >> second possibility has yet been effectively >> addressed so far; all the methods of using news >> to do the job send far too many copies to sites >> that don't care at all about the information, >> wasting time, money, and patience. > Each site will store one copy of each report. He > might get one from each feed, but he probably > won't, any more than he gets any other article > from every feed he has. If he does have 2 fully > redundant feeds, he probably wants it that way. Don't know about where you are, but here the dropped news and other problems require redundant feeds and still we miss parts of source group multi-part postings and such. Using a threaded newsreader like trn gives you a new appreciation of just how much news _never_ gets to your site; I'd guess the raw failure rate is between 5% and 15% of all news goes missing with a single feed, 2%-5% with two fairly independent feeds. >> 2) A human factors problem: how to put the >> information returned in a place where it is hard >> to ignore, not lost in a mass of similar, but >> inapplicable, warnings that condition the >> intended recipient out of looking for the >> warnings at all. > Remember a key facet of the proposal is that there > will be a small number of sites that pick up all > the error reports and mail them to the site with > the problem. Hardly a "key facet"; it was included as an optional gawdawful nuisance that would be included if the yelling were sufficiently loud, but preferably not. Also, it won't work. The further you get from the path from failing article posting site to failing article detecting site, the less chance you have of getting email through. In reality, the sites you are trying to reach are mostly leaf sites at the end of long cul-de-sacs, running little known or long superseded software, exactly the sites least likely to have good mail map entries and the longer you make the mail path, the less likely it is to function. > Thus, you won't have to read the group to find out > you have a problem, someone will send you mail > telling you. Which you are unlikely to receive, the mailing site now being many more hops away. > Of course, newer news software could have a > feature whereby it scans the error reports looking > for errors referring to the current site, so that > notification would be swifter, Sure, but the sites running up to date software are not the ones we need to reach, so don't even include them in your planning. > and would occur even if mail couldn't get through, > but it will still work quite well for a site not > running such software, because he'll get > notification by mail. Probably not. >> It's the old "crying wolf" problem. I have yet to >> see a proposal here that beats sending a small >> number of _pertinent_ notices to the email box of >> the author/site admin, a location that a) is >> regularly perused by the recipient, and b) waits >> "forever" to be read (no expire == loss of >> information). > Unfortunately, it is all too likely that the small > number in question is precisely zero. The various > configurations of net connectivity are such that > I've yet to see a proposal for a probabilistic > method that hasn't been shot down by a counter > example of a configuration that would get too many > or too few messages. Sigh. Between the idiots who want this whole discussion to be decided on a popularity contest between Mathew and Henry rather than the rather profound technical and software engineering and programming ethics issues, and the people who, as Mathew says, keep moving the goalposts, it's profoundly difficult to make progress here. Given that no one has bothered to make a definitive statement of what constitutes "too many" or "too few" messages, of course the non-contributors have no trouble setting up their own strawmen, and then promptly knocking them down. If I can state that at least one article in sixteen posted with a bad header gets a response, and at most sixteen responses are returned per failing article, then I suspect a sufficient two parameter probabilistic email notification system can be designed, though I'm not going to be the one to do it. >> Nothing posted to a busy newsgroup satisfies this >> need at all. Most "inform by news" proposals >> would require changes at the site of origin to >> pull out only the pertinent notices and present >> them to the responsible party, and one of the >> givens in this situation is that changes at the >> site of origin won't be made until _after_ notice >> of a problem is seen, making "inform by news" >> methods worthless. > Read the whole proposal. The mail out notification > was in Neil Rickert's original posting about this > method of reporting errors. Yes it was, with the attitude described above, and, as noted above, it simply will fail in too large a proportion of cases. The design goals for a notification method should _not_ require that each site get the _same_ level of notification or have the _same_ limits on notices received, just that each site have _some_ useful level of notification, and that each site have _some_ guarantee of a reasonable limit on the number of notices received. It should also be a design goal that only Cnews sites require software changes or behavioral changes to make the notification methods work. Lots of sites have "absentee sysadmins" that essentially _never_ read news, and will only know about any of this if a message shows up in their email. More important, it is probably really worthwhile putting up and agreeing upon the design goals for notification, before haranguing about whether one or another proposed design meets the goals; a moving goalposts problem again. Kent, the man from xanth. <xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (06/19/91)
In article <1991Jun18.113758.16382@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes: | Not at all true if a month of news gets barfed back | onto the net by a mechanism that looks like a bad | header problem and evokes this mechanism you defend; | at that point, if the Bnews site connectivity from | the failing site is enough to get the barf out to | the "whole net", the deluge of messages this | mechanism you defend would dump on the net would | disable the whole net for a significant period of | time, like weeks, while sysadmins cleared up the | junk while trying to preserve the news and mail. I don't think you thought that one through. Start with two statements which I believe are widely if not without exception true. 1. we would generate a reject notice for badly formed articles 2. we would not forward rejected articles Thus you are unlikely to have anything broken reach "the whole net." I would also question a failure mechanism which would mung old articles in such a way that they would satisfy these criteria to cause problems: 1. munged enough to be rejected elsewhere 2. intact enough to be accepted by news locally 3. intact enough to be sent out. Remember that sendbatch runs from the database at most sites, and just restoring the articles won't send anything. Also, batches are compressed, and I can't believe damage as limited as is needed to satisfy the requirements above could happen in a compressed file. Finally, bad messages generating warnings would generate a reply from at most every site at which they were received, hardly likely to bring down the net with a few lines per site worst case. Some weeks I get my sys file requested 6-7 times in a week, and that doesn't even make a bobble in the volume curve. I bet some of the postings have generated flames from more readers than one per site, and the net survived. The fact that rejectors don't propigate saves the volume. In short I believe your scenario of gigabytes of rejects is an overexageration of any worst case failure I can imagine. | Don't know about where you are, but here the dropped | news and other problems require redundant feeds and | still we miss parts of source group multi-part | postings and such. Using a threaded newsreader like | trn gives you a new appreciation of just how much | news _never_ gets to your site; I'd guess the raw | failure rate is between 5% and 15% of all news goes | missing with a single feed, 2%-5% with two fairly | independent feeds. Depends on the quality of the feed. Some news is dropped at the site, and unless you have a feed right from them you don't get it. The volume dropped in a feed from a site like uunet is probably a lot less than the 2% you quote. I can't measure easily, because a lot of news here come *first* from another site. I can tell there's a lot of news I didn't get from uunet, but I can't tell if I would have gotten it if I didn't already have the article. | More important, it is probably really worthwhile | putting up and agreeing upon the design goals for | notification, before haranguing about whether one or | another proposed design meets the goals; a moving | goalposts problem again. It is always worth setting the goals before evaluating the solutions. I sense a lack of agreement that letting the poster know about rejections is not a goal in C news. I can't help think that dropping the news, not saying that it was dropped, and not saying why, is a case of putting the computer before the user. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) GE Corp R&D Center, Information Systems Operation, tech support group Moderator comp.binaries.ibm.pc and 386-users digest.