[net.news] Suggestion for checksum on articles

jerry@oliveb.UUCP (Jerry Aguirre) (07/20/85)

Considering the number of garbled articles and the constant discussion
about what site using what form of batching clobbered what line,
shouldn't we consider adding a header line containing a checksum?

The checksum would have to include the unchanging header lines and the
body of the article.  The path, forwarding version, xref, etc. headers
would have to be excluded from the checksum.  Or, the checksum could be
recalculated as the headers are modified.

With a checksum and the sendbad control message we might have some
chance of cleaning up this problem.  The log entry left by the sendbad
control message would also be an indication to the SA that there is
something wrong in the feed.

The only hard part of this comming up with a checksum that is portable
across the wide varity of machines on the net.

Comments?

				Jerry Aguirre @ Olivetti ATC
{hplabs|fortune|idi|ihnp4|tolerant|allegra|tymix}!oliveb!jerry

guy@sun.uucp (Guy Harris) (07/22/85)

> The only hard part of this comming up with a checksum that is portable
> across the wide varity of machines on the net.

Given that the vast majority of UUCP connections on USENET run over serial
lines using the checksummed "g" protocol, I think we have an existence proof
for such a protocol.  We may not want to use that protocol, though, due to

	1) trade secret restrictions, although Lauren Weinstein could
	   just tell us what the algorithm is

and

	2) the fact that at least some implementations of it within
	   UUCP don't compile into correct code with some C compilers.

To make the checksum portable:

	1) always checksum bytes, not anything larger - but since we're
	   processing text here, it's unlikely that anybody'd go through
	   the trouble to accumulate two bytes and do checksumming on two-
	   byte quantities.

	2) don't use "native" arithmetic operations like addition and
	   subtraction; there is, I believe, at least one Univac 1100 on
	   the net and it's a one's-complement machine.

	3) don't assume that characters are signed or that they're
	   unsigned - there are lots of signed-character VAXes on the net
	   and there are lots of unsigned-character 3Bs on the net.

	4) don't assume that casts will be done in the order you think
	   they will - I believe that was the cause of lots of porting
	   problems for the UUCP "g" protocol checksum code.

	Guy Harris

wls@astrovax.UUCP (William L. Sebok) (07/25/85)

I rather wish that there were some sort of article consistency check, like
a checksum or maybe even just the line number, and if a "good" copy of an
article arrived at a site after a "bad" copy that the "good" copy replace
the "bad" copy rather than being rejected as a "duplicate article".

The branchs in the network give it a potential for redundancy.  However, with
the present arrangement if a article is garbled somewhere the garbled copy is
still the one propagated if it arrives at a branch point first.
-- 
Bill Sebok			Princeton University, Astrophysics
{allegra,akgua,cbosgd,decvax,ihnp4,noao,philabs,princeton,vax135}!astrovax!wls