[comp.sources.d] Improving SHAR

denbeste@bgsuvax.UUCP (William C. DenBesten) (06/03/88)

Please note that followups are directed to comp.sources.d

In article <7985@brl-smoke.ARPA> w8sdz@brl.arpa Keith Petersen writes:
> When Usenet can guarantee error-free and non-truncated transmission of clear
> text files  I will agree to posting clear text files.  Until that day
> arrives (is anyone working on it?) I will continue to post them as ARC
> files in the comp.binaries.ibm.pc newsgroup.

Someone else wrote, and I paraphrase:
> The disadvantage is that a uuencoded file really messes up the
> compression that goes on when UUCP is used to transfer news.

From article <2277@rpp386.UUCP>, by jfh@rpp386.UUCP (John F. Haugh II):
> the advantage of arc files is that arc includes crc's, where as shar doesn't.

The solution that everyone seems to be stabbing near, but not hitting
is that we need a shar-archiver that includes a crc check in it,
rather than a simple character count.  I have faked up a shar below
that demonstrates this.

There is one small problem with this idea, but it is easily solvable.
There is not a standard unix utility that will do a crc.  We
would have to ship one around the net.  Not having the crc program
would not be a real hardship, as one could always unwrap a shar by
creating a fake crc program and ignoring the insuing crc error warnings.

#! /bin/sh
# This is a shell archive
echo "shar: extracting 'hello.c' (36 characters)"
if test -f 'Makefile'
then
echo shar: "will not over-write existing file 'hello.c'"
else
cat << \SHAR_EOF > 'hello.c'
main()
{
  printf("Hello, World");
}
SHAR_EOF
if test `crc < hello.c` != 34f6
then
echo "shar: CRC checksum error in 'hello.c'"
fi
fi
exit 0
#End of shell archive

root@cca.ucsf.edu.UUCP (06/03/88)

In article <2349@bgsuvax.UUCP>, denbeste@bgsuvax.UUCP (William C. DenBesten) writes:
>  ...
> The solution that everyone seems to be stabbing near, but not hitting
> is that we need a shar-archiver that includes a crc check in it,
> rather than a simple character count.
> ...
> There is one small problem with this idea, but it is easily solvable.
> There is not a standard unix utility that will do a crc.
> 

We already have it.

The "vitals" utility posted to comp.sources.unix, Volume 11 provides
crc computation in addition to the length check.  There was also an earlier
posting of a crc routine that we have used extensively for such a purpose.

Thos Sumner       (thos@cca.ucsf.edu)   BITNET:  thos@ucsfcca
(The I.G.)        (...ucbvax!ucsfcgl!cca.ucsf!thos)

OS|2 -- an Operating System for puppets.

#include <disclaimer.std>

bd@hpsemc.HP.COM (bob desinger) (06/04/88)

William C. DenBesten (denbeste@bgsuvax.UUCP) writes:
> The solution ...
> is that we need a shar-archiver that includes a crc check in it,
> rather than a simple character count.
> There is not a standard unix utility that will do a crc.

How about the `sum' program?  The System V and Xenix versions offer
a -r option to produce BSD-compatible output, so you can be reasonably
portable (about 95%) with that.

The shar I use emits code to handle this, although I admit it has its
own sum code to compute the checksums.  (If I had written that part of
the code, I probably would have forked the sum program instead of
handcrafting the information myself.  But the code was written already
and it worked, so I left it in.)

The emitted shar looks like this.  Comments added for this article
appear after the "#".

	# Decide if we should use `sum -r' or just plain old `sum'
	if sum -r </dev/null >/dev/null 2>&1
	then	sumopt='-r'
	else	sumopt=''
	fi

	# The wrapped file goes here.
	# I sharred an empty file, so the numbers are 0 in this demo.

	# Now check the checksum.
	set `sum $sumopt <filename`
	if test $1 -ne 0
	then	echo ! filename checksum should be 0, but is $1
	fi

While we're on the subject, you can get more than just a simple
character count out of `wc'.  Here's my shar again:

	set `wc -lwc <filename`
	if test $1 -ne 0 -o $2 -ne 0 -o $3 -ne 0
	then	echo ! filename should have 0 lines, 0 words, and 0 characters
		echo ! but has $1 lines, $2 words, and $3 characters
	fi

This isn't as thorough as a real CRC, but on the other hand I've sent
out perhaps 100 megabytes of sources using only the wc-check and no one
has ever asked me to re-send because something was damaged in transit.
The medium is seems reliable.

-- bd

bart@reed.UUCP (Bart Massey) (06/05/88)

In article <2349@bgsuvax.UUCP> denbeste@bgsuvax.UUCP (William C. DenBesten) writes:
> Please note that followups are directed to comp.sources.d
> 
> In article <7985@brl-smoke.ARPA> w8sdz@brl.arpa Keith Petersen writes:
> > When Usenet can guarantee error-free and non-truncated transmission of clear
> > text files  I will agree to posting clear text files.  Until that day
> > arrives (is anyone working on it?) I will continue to post them as ARC
> > files in the comp.binaries.ibm.pc newsgroup.
> 
> The solution that everyone seems to be stabbing near, but not hitting
> is that we need a shar-archiver that includes a crc check in it,
> rather than a simple character count.  I have faked up a shar below
> that demonstrates this.
> 
> There is one small problem with this idea, but it is easily solvable.
> There is not a standard unix utility that will do a crc.

Why stop at CRC?  What *I'd* really like is ECC, plus some kind of block
numbering, so that the shar can recover from short errors, and one can
request only the mangled/missing blocks in recovering from larger errors,
greatly reducing netwidth for larger errors.  Perhaps a simple line numbering
scheme, with per-line 1-byte ECC, and a simple utility for decoding this?
The same utility could also handle text substitutions for sharing/unsharing,
and non-ascii character substitution/desubstitution...  

Oops.  I'm dreaming again.  Sorry :-)  If I wrote such a program, would
anyone use it?

(UNIX challenge:  using only standard UNIX utilities (no C, Pascal, etc., no
local or system-dependent utils) write a shell script for converting ^
escapes (e.g.  ^Z == 26 decimal, ASCII SUB) and octal escapes (e.g.
<backslash>177 == ASCII DEL (not a chance a backslash would make it intact
in this message :-)) into the appropriate 7-bit ASCII characters within a
text file.  BONUS:  Handle 8-bit octal escapes.  NOTE: use the escape
convention that <backslash>136 is a literal caret, and <backslash>134 is a
literal backslash in the encoded file.)

							Bart

caf@omen.UUCP (Chuck Forsberg WA7KGX) (06/06/88)

In article <2349@bgsuvax.UUCP> denbeste@bgsuvax.UUCP (William C. DenBesten) writes:
:
:There is one small problem with this idea, but it is easily solvable.
:There is not a standard unix utility that will do a crc.  We
:would have to ship one around the net.  Not having the crc program
:would not be a real hardship, as one could always unwrap a shar by
:creating a fake crc program and ignoring the insuing crc error warnings.
:
Why not use the "crc" program that's port of the recent rzsz (ZMODEM
programs for Unix) posting?  It produces output like:

92C4A5DF    2064 /tmp/article1659

using the Fed standard 32 bit that ZMODEM uses.

Chuck Forsberg WA7KGX          ...!tektronix!reed!omen!caf 
Author of YMODEM, ZMODEM, Professional-YAM, ZCOMM, and DSZ
  Omen Technology Inc    "The High Reliability Software"
17505-V NW Sauvie IS RD   Portland OR 97231   503-621-3406
TeleGodzilla BBS: 621-3746   CIS: 70007,2304    Genie: CAF