csc@watmath.UUCP (Computer Sci Club) (10/21/86)
I mentioned in a previous article that I had written some arbitrary 8 bit encoding programs that I could easily turn into an archiver specifically designed for network transfer. Since there seems to be some interest in the topic in general, I'll tell you what my programs do. In the most general case, my encoding routines are given a list of characters that the transmission medium does not want to see. It will then never generate those characters. It accepts a stream of arbitrary 8 bit bytes, and produces an encoded or decoded stream. The actual encoder operates in two modes: eight bit encoding (M8) and seven bit encoding (M7). The encoder decides what mode it should be running in, depending on the cost of the mode. In seven bit encoding mode, any seven bit character can be transmitted. Any character the encoder has been told is pathological is mapped into a two character escape code. This mode is obviously very cheap for text. Any transmittable character is sent as itself. In eight bit mode, the encoder can accept any 8 bit value. Each group of three input bytes is turned into four output 6 bit values. These six bit values are then mapped onto the ASCII characters "a-zA-Z,.". Run length encoding is performed in both modes. The archive format I was considering would have a special archive control character (something non-controversial) which would never be generated by the encoder. The archive control characters would signal the beginning of easily parse text strings that would describe the beginning and end of archived files, their CRC's and lengths. It would be possible to generate checkpoints in the archive. The archiver could extract all undamaged files from an damaged archive. The checkpoints could be used for retransmitting parts of the archive. If an archive was damaged, the archiver could tell the user what part of the archive it needed replaced. Anyone else who had a complete archive could use that information to generate just the data needed to repair the broken archive. The archiver will treat a set of unordered files as an archive. Each file would be searched for a header. The header would identify the archive and be used to order the parts. It would then read the files in the correct order. The archive creation command would automatically generate a numbered set of files for posting, according to a maximum size constraint. No more editing and cat'ing of news articles. My experiments show me that this program encodes a.out files slightly more cheaply than uuencode, text files are very cheap. The experimental version does not transmit any of the following characters: all control characters, "<>{}[]^|\\~", and del. If there is any interest, myself and another fellow here, Mike Gore, will put this together out of code we already have (for encoding, CRC checking) and we will post the source (for UNIX and Atari) and uuencoded versions for the Atari ST. It would be written to be portable. Comments? Tracy Tims mail to ihnp4!watmath!unit36!tracy
csc@watmath.UUCP (Computer Sci Club) (10/22/86)
As a followup to my previous article, I have come across a new encoding technique (courtesy of the Math Faculty Computing Facility here at the university) which has the following properties: - eight bit input data - very restricted character set - generally compresses files rather than expanding them (including binaries) - need no bit level twiddling - involves only table lookup The encoding system is easy to implement. I am whipping one up now. This should make the transmittable archive format very easy to implement. My co-developer, Mike Gore, is planning to do a Basic (gack spew) version after I write one in C. Still: comments? Tracy Tims mail to ihnp4!watmath!unit36!tracy