jef@well.UUCP (Jef Poskanzer) (11/16/89)
Seems like we could add an end-to-end checksum to netnews articles in an upward compatible fashion. Add a new header field, "Checksum: ", based on the entire article except the Path: and Checksum: headers. Modify the news software to add the checksum to locally-posted articles, and check it if present on articles from elsewhere. The only problem would be if the string "\nChecksum: " itself got munged. But then we are no worse off than we are now. Participation would be strictly voluntary. If you are interested in protecting your articles against getting munged, you have an incentive to add in the checksum; if you are interested in seeing fewer munged articles, you have an incentive to check the checksum. Note that this means no more gratuitous header re-writing. Bet Henry like it for this reason... --- Jef Jef Poskanzer jef@well.sf.ca.us {ucbvax, apple, hplabs}!well!jef "Talkers are no good doers." -- Shakespeare, Henry VI
tneff@bfmny0.UU.NET (Tom Neff) (11/16/89)
One problem with checksumming news is that if done wrong, it could presume too much about the internal representation of articles. If I have an EBCDIC machine that stores all text files as fixed-length blank-padded 80 character logical records with no concept of '\n', I should still be able to run News. Such machines could ignore the Checksum field, of course. But they're as prey to noise and article corruption as anyone else, and deserve the benefits of the feature. Checksumming could work if the following rules apply: * Start with the first nonblank line of the article body; do not include headers. (Think about the "Path" field.) * Only checksum nonblank lines. * Only count characters in the graphics-64 set; use their ASCII numeric representations. (Non-ASCII machines can just translate while computing.) A new field like Checksum is only worth adding if practically everyone can get some use out of it. The above suggestions would allow lots of different machines both to generate and check the field. -- When I was [in Canada] I found their jokes like their | Tom Neff roads -- not very long and not very good, leading to a | tneff@bfmny0.UU.NET little tin point of a spire which has been remorselessly obvious for miles without seeming to get any nearer. -- Samuel Butler.
henry@utzoo.uucp (Henry Spencer) (11/17/89)
In article <14594@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes: >Seems like we could add an end-to-end checksum to netnews articles in >an upward compatible fashion. Add a new header field, "Checksum: ", >based on the entire article except the Path: and Checksum: headers. >Modify the news software to add the checksum to locally-posted >articles, and check it if present on articles from elsewhere. Such a scheme existed in an early version of C News. We eventually abandoned it. The problem is, what do you do when you receive an article with a bad checksum? The underlying difficulty is that news not infrequently travels via networks that corrupt the data in "benign" ways, e.g. substituting spaces for tabs. Throwing away such articles means you don't see perfectly-readable news. Keeping them and reporting on them just increases the noise level in the sysadmin's mailbox, since all too often the responsible parties won't (or can't) fix their software. We thought about checksumming only non-blank characters, but that drives the cost up considerably, and there's still the question of what to do with a bad article. It just didn't seem worth it. >Note that this means no more gratuitous header re-writing. Bet Henry >like it for this reason... Alas, not so. Since it is *necessary* to rewrite the Path header, the checksum has to be recomputed every time anyway. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
coolidge@brutus.cs.uiuc.edu (John Coolidge) (11/17/89)
tneff@bfmny0.UU.NET (Tom Neff) writes: >Checksumming could work if the following rules apply: > * Start with the first nonblank line of the article body; do not > include headers. (Think about the "Path" field.) I agree with the original proposal here. Checksum all of the standard headers except Path: (and Xref:). Message-id's seem to be the most common thing to get mashed in transmission. They're also the most _important_ thing that gets mashed most of the time. Checksumming should start with the headers. >A new field like Checksum is only worth adding if practically everyone >can get some use out of it. The above suggestions would allow lots >of different machines both to generate and check the field. That's true. On the other hand, I don't see anything in Checksum: that would imply that headers not get checksummed. The most important reason for me to want Checksum: is to protect against bogus message-ids. Of course, there _is_ the side benefit that it would stop gratuitous header re-writing (hurrah!). --John -------------------------------------------------------------------------- John L. Coolidge Internet:coolidge@cs.uiuc.edu UUCP:uiucdcs!coolidge Of course I don't speak for the U of I (or anyone else except myself) Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed. You may redistribute this article if and only if your recipients may as well. New NNTP connections always available! Send mail if you're interested.
lmb@vicom.com (Larry Blair) (11/17/89)
In article <14594@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes:
=Seems like we could add an end-to-end checksum to netnews articles in
=an upward compatible fashion. Add a new header field, "Checksum: ",
=based on the entire article except the Path: and Checksum: headers.
=Modify the news software to add the checksum to locally-posted
=articles, and check it if present on articles from elsewhere.
I brought up this issue a while back and was soundly thrashed. Actually,
a checksum is not the answer. What is needed is a CRC.
My main interest, at the time, was to try to stop the bogus Message-ID's
that were causing problems. I found that there are a large number of
munged articles circulating on Usenet.
Obviously, there is the problem that either the Path: line has to be ignored
or the CRC will be to be recalculated with every hop.
--
Larry Blair ames!vsi1!lmb lmb@vicom.com
lmb@vicom.com (Larry Blair) (11/17/89)
In article <14922@bfmny0.UU.NET> tneff@bfmny0.UU.NET (Tom Neff) writes:
=Checksumming could work if the following rules apply:
=
= * Start with the first nonblank line of the article body; do not
= include headers. (Think about the "Path" field.)
Who cares about the article body? It is the headers that need to be protected,
particularly the Message-ID.
--
Larry Blair ames!vsi1!lmb lmb@vicom.com
tneff@bfmny0.UU.NET (Tom Neff) (11/17/89)
Exempting specific header fields (Path, Xref) from checksumming is an acceptable alternative to checksumming just the body, provided you don't mind enthroning a specific set of headers in the RFC (this proposal needs one, by the way). I should point out that there is a limit to what you can do with a proven-corrupted message if the corruption includes the header. You cannot attempt to promulgate error information elsewhere, because you do not know if the Message-ID is valid. It seems worthwhile to log corrupted articles carefully including path information, so that suspect links can be investigated. The info could be collected periodically a la Arbitron and collated centrally for announcement on a new cable TV show, "Usenet's Most Wanted" :-) -- "Of course, this is a, this is a Hunt, you | Tom Neff will -- that will uncover a lot of things. | tneff@bfmny0.UU.NET You open that scab, there's a hell of a lot of things... This involves these Cubans, Hunt, and a lot of hanky-panky that we have nothing to do with ourselves." -- RN 6/23/72
jef@well.UUCP (Jef Poskanzer) (11/17/89)
In the referenced message, coolidge@cs.uiuc.edu wrote: } Message-id's seem to be the most }common thing to get mashed in transmission. Uh, doesn't this seem a little unlikely to you? Doesn't it seem more likely that bits all over the articles are getting munged with equal probability, but we tend to notice when it happens to Message-Id's since that results in duplicate articles? In fact, precisely this realization is what led me to propose Checksum: -- if it's true, then munging is *far* more common than previously thought. --- Jef Jef Poskanzer jef@well.sf.ca.us {ucbvax, apple, hplabs}!well!jef "If you give me six lines written by the most honest man, I will find something in them to hang him." -- Cardinal de Richelieu
tneff@bfmny0.UU.NET (Tom Neff) (11/17/89)
In article <1989Nov16.191837.13850@vicom.com> lmb@vicom.COM (Larry Blair) writes: >Who cares about the article body? It is the headers that need to be protected, >particularly the Message-ID. I think this could officially be called the "News for its own sake" position. :-) -- "The country couldn't run without Prohibition. ][ Tom Neff That is the industrial fact." -- Henry Ford, 1929 ][ tneff@bfmny0.UU.NET
coolidge@brutus.cs.uiuc.edu (John Coolidge) (11/17/89)
jef@well.UUCP (Jef Poskanzer) writes: >In the referenced message, coolidge@cs.uiuc.edu wrote: >} Message-id's seem to be the most >}common thing to get mashed in transmission. >Uh, doesn't this seem a little unlikely to you? Doesn't it seem more >likely that bits all over the articles are getting munged with equal >probability, but we tend to notice when it happens to Message-Id's >since that results in duplicate articles? Hmm. I've seen plenty of articles with munged Message-id's, and many of those had munged text too. I don't remember ever seeing an article with munged text in which the Message-id _wasn't_ munged, though. This is probably because, if the text gets mashed but the Message-id doesn't, then downstream feeds don't take the mashed copy but only the good one (all of the duplicate, mashed Message-id postings I've seen seem to have been sent twice by the originating site...). This might imply that checksumming Message-id, while not sufficient to stop all body-munging, will have the effect of stopping lots of it by halting propagation. In any case, checksumming all possible header fields and the entire text is the right thing to do anyway. --John -------------------------------------------------------------------------- John L. Coolidge Internet:coolidge@cs.uiuc.edu UUCP:uiucdcs!coolidge Of course I don't speak for the U of I (or anyone else except myself) Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed. You may redistribute this article if and only if your recipients may as well. New NNTP connections always available! Send mail if you're interested.
coolidge@brutus.cs.uiuc.edu (John Coolidge) (11/17/89)
tneff@bfmny0.UU.NET (Tom Neff) writes: >Exempting specific header fields (Path, Xref) from checksumming is an >acceptable alternative to checksumming just the body, provided you don't >mind enthroning a specific set of headers in the RFC (this proposal >needs one, by the way). What really needs to be done is to integrate the Checksum: feature and all of the bugs in the RFC's that have come to light in this group over the last few months and make up an all-new Standards for Transmission of Usenet News RFC. Limit specifically the ability of header-mungers to work their evil voodoo, fix implementation-specific RFC bogosities, that sort of thing. >It seems worthwhile to log corrupted articles carefully including path >information, so that suspect links can be investigated. The info could >be collected periodically a la Arbitron and collated centrally for >announcement on a new cable TV show, "Usenet's Most Wanted" :-) Yup, this sounds like the right thing to do. Shunt the damaged article off to the side, report the problem, and make sure it doesn't go past your site. --John -------------------------------------------------------------------------- John L. Coolidge Internet:coolidge@cs.uiuc.edu UUCP:uiucdcs!coolidge Of course I don't speak for the U of I (or anyone else except myself) Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed. You may redistribute this article if and only if your recipients may as well. New NNTP connections always available! Send mail if you're interested.
jef@well.UUCP (Jef Poskanzer) (11/17/89)
In the referenced message, henry@utzoo.uucp (Henry Spencer) wrote: }In article <14594@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes: }>Note that this means no more gratuitous header re-writing. Bet Henry }>like it for this reason... } }Alas, not so. Since it is *necessary* to rewrite the Path header, the }checksum has to be recomputed every time anyway. Huh? I didn't say no header re-writing, I said no gratuitous header re-writing. Did you read this: }>based on the entire article except the Path: and Checksum: headers. ^^^^^^ ^^^^^ ? Unless I'm missing something, you compute it once, when an article is submitted, and each site checks it upon reception. That's the whole point of an end-to-end check. Sure it means that the Path: is not protected, but I can live with that. As for the other issues you mentioned, that's why I proposed that this be voluntary. Some sites, mine certainly among them, will decide that reliable transport is more important than getting the miniscule number of articles that can get here only through EBCDIC links or other bogosities. --- Jef Jef Poskanzer jef@well.sf.ca.us {ucbvax, apple, hplabs}!well!jef "I do not believe that this generation of Americans is willing to resign itself to going to bed each night by the light of a Communist moon..." -- Lyndon B. Johnson
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (11/17/89)
Re: checksums or crc for news articles >Note that this means no more gratuitous header re-writing. Bet Henry >like it for this reason... You could use a crc or checksum that isn't sensitive to header ordering. For example, do a crc on each line in the header and sum those together. -- Jon Zeeff <zeeff@b-tech.ann-arbor.mi.us> Branch Technology <zeeff%b-tech@iti.org>
allbery@NCoast.ORG (Brandon S. Allbery) (11/19/89)
Checksumming can't be relied upon in the case of ASCII <-> EBCDIC translations anyway -- EBCDIC is not a single character set, but a related family of character sets which are each slightly different from the other. Worse, there are characters in EBCDIC which don't map to ASCII (which may affect postings originating from BITNET sites) and characters in ASCII which map to two or possibly more characters in EBCDIC (consider "|"). ++Brandon -- Brandon S. Allbery allbery@NCoast.ORG, BALLBERY (MCI Mail), ALLBERY (Delphi) uunet!hal.cwru.edu!ncoast!allbery ncoast!allbery@hal.cwru.edu bsa@telotech.uucp *(comp.sources.misc mail to comp-sources-misc[-request]@backbone.site, please)* *Third party vote-collection service: send mail to allbery@uunet.uu.net (ONLY)* expnet.all: Experiments in *net management and organization. Mail me for info.
lemke@radius.UUCP (Steve Lemke) (11/19/89)
In article <14603@well.UUCP> Jef Poskanzer <jef@well.sf.ca.us> writes: } ^^^^^^ ^^^^^ }? Unless I'm missing something, you compute it once, when an article }is submitted, and each site checks it upon reception. That's the whole }point of an end-to-end check. Sure it means that the Path: is not }protected, but I can live with that. OK, so it gets checked upon reception, but maybe I'm missing something here - what happens if the check fails? This doesn't seem like it will happen real-time (as the transfer is taking place) but rather when the rnews executes, which is usually after it's too late to ask for the article to be sent again. Well, do you somehow try to request that article again later? Of course, if it was munched then you don't really know how to ask for it again, do you? So, do you ignore it? You only want non-munched news, regardless of what it was that actually caused the checksum to fail? Just forget about anything that doesn't pass? I just recently started running bnews and everything seems to be working fine, but I'm no expert - I'm just curious as to what happens in the event that your proposed checksum fails upon receipt. -- ============================================================================= ===== Steve Lemke, Engineering Quality Assurance, Radius Inc., San Jose ===== ===== Reply to: radius!lemke@apple.com (Coming soon: radius.com ...) ===== ===== AppleLink: Radius.QA; GEnie: S.Lemke; Compu$erve: 73627,570 =====
jef@well.UUCP (Jef Poskanzer) (11/19/89)
In the referenced message, radius!lemke@apple.com (Generic Account) wrote: } You only want non-munched }news, regardless of what it was that actually caused the checksum to fail? }Just forget about anything that doesn't pass? Yep. Log it, especially the path, and drop it in the bit bucket. Or at least, this is what I intend to do. Other sites are perfectly free to do otherwise. I actually considered posting it in junk, but then if a non-munged version of the article happened along it would get rejected. I could fix this, but I think it would take more hacking than I'm interested in doing. Hmm. I can just see Brad getting back at me for my references lines by adding bogus checksums to all his articles. Well, if he wants all his articles to get dropped, I guess that's ok... --- Jef Jef Poskanzer jef@well.sf.ca.us {ucbvax, apple, hplabs}!well!jef "...for DEATH awaits you all, with nasty sharp pointy teeth!" -- Tim
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (11/20/89)
Re: what to do if a checksum on a news article is bad If you only have one source for news, there is nothing you can do. If you have multiple sources, you can reject the article in the hope that someone will offer you a good one. Or, what I would prefer would be to keep the article along with some indication that it is probably bad and should be replaced with a good copy if it comes along. -- Jon Zeeff <zeeff@b-tech.ann-arbor.mi.us> Branch Technology <zeeff@b-tech.mi.org> It's 1989. Does your software support the ISO 8859 character sets?
blarson@dianne.usc.edu (bob larson) (11/20/89)
In article <1989Nov18.185648.3525@NCoast.ORG> allbery@ncoast.ORG (Brandon S. Allbery) writes: >Checksumming can't be relied upon in the case of ASCII <-> EBCDIC translations Protection against non-invertable translations is one of the reasons for the "checksum" (should be crc) as far as I am conserned. Remember all the problems caused when one of the RN (?) patches was mangled to a significant portion of the net due to a nonstandard news feed (via BITNET) that expanded a tab to spaces? Invertable transformations shouldn't be a problem, it's just a little harder to computate the crc on the system that decides not to store articles in ASCII. Having just the sites that have multiple redundant feeds drop mashed articles would be a big help reducing the number of such articles propigating. (usc is such a site.) -- Bob Larson blarson@dianne.usc.edu usc!dianne!blarson --** To join Prime computer mailing list **--- info-prime-request@ais1.usc.edu usc!ais1!info-prime-request