[net.micro] New way to post binaries - discussion on format

csc@watmath.UUCP (Computer Sci Club) (10/21/86)

I mentioned in a previous article that I had written some arbitrary 8 bit
encoding programs that I could easily turn into an archiver specifically
designed for network transfer.  Since there seems to be some interest in
the topic in general, I'll tell you what my programs do.

In the most general case, my encoding routines are given a list of characters
that the transmission medium does not want to see.  It will then never
generate those characters.  It accepts a stream of arbitrary 8 bit bytes,
and produces an encoded or decoded stream.

The actual encoder operates in two modes:  eight bit encoding (M8) and
seven bit encoding (M7).  The encoder decides what mode it should be running
in, depending on the cost of the mode.

In seven bit encoding mode, any seven bit character can be transmitted.  Any
character the encoder has been told is pathological is mapped into a two
character escape code.  This mode is obviously very cheap for text.  Any
transmittable character is sent as itself.

In eight bit mode, the encoder can accept any 8 bit value.  Each group of
three input bytes is turned into four output 6 bit values.  These six bit
values are then mapped onto the ASCII characters "a-zA-Z,.".

Run length encoding is performed in both modes.

The archive format I was considering would have a special archive control
character (something non-controversial) which would never be generated
by the encoder.  The archive control characters would signal the beginning
of easily parse text strings that would describe the beginning and end
of archived files, their CRC's and lengths.  It would be possible to
generate checkpoints in the archive.  The archiver could extract all
undamaged files from an damaged archive.

The checkpoints could be used for retransmitting parts of the archive.  If
an archive was damaged, the archiver could tell the user what part of the
archive it needed replaced.  Anyone else who had a complete archive could use
that information to generate just the data needed to repair the broken archive.

The archiver will treat a set of unordered files as an archive.  Each
file would be searched for a header.  The header would identify the archive
and be used to order the parts.  It would then read the files in the
correct order.  The archive creation command would automatically generate
a numbered set of files for posting, according to a maximum size constraint.
No more editing and cat'ing of news articles.

My experiments show me that this program encodes a.out files slightly more
cheaply than uuencode, text files are very cheap.  The experimental version
does not transmit any of the following characters:  all control characters,
"<>{}[]^|\\~", and del.

If there is any interest, myself and another fellow here, Mike Gore, will
put this together out of code we already have (for encoding, CRC checking)
and we will post the source (for UNIX and Atari) and uuencoded versions for
the Atari ST.

It would be written to be portable.

Comments?

Tracy Tims
mail to ihnp4!watmath!unit36!tracy

csc@watmath.UUCP (Computer Sci Club) (10/22/86)

As a followup to my previous article, I have come across a new encoding
technique (courtesy of the Math Faculty Computing Facility here at
the university) which has the following properties:

	- eight bit input data
	- very restricted character set
	- generally compresses files rather than expanding them
	  (including binaries)
	- need no bit level twiddling
	- involves only table lookup

The encoding system is easy to implement.  I am whipping one up now.

This should make the transmittable archive format very easy to implement.
My co-developer, Mike Gore, is planning to do a Basic (gack spew) version
after I write one in C.

Still:  comments?

Tracy Tims
mail to ihnp4!watmath!unit36!tracy