[comp.sys.m6809] CUTS decoder and LISTSERV CoCo list

EWTILENI@pucc.Princeton.EDU (Eric Tilenius) (03/25/88)

Below is a description of the CUTS file encoding system, using on the
LISTSERV server at pucc.Princeton.EDU/pucc.BITNET.
 
Following that is the CUTS decoder itself.  This is a system we worked
out to allow ASCII files with lengths greater than 80 and BINARY files
to flow easily over the mail gateways.
 
It is an excellent method of posting software to LISTSERV or this group.
 
Currently, at the CoCo Mailing list, we have a good deal of CoCo software
being put up in CUTS format.  If you haven't joined us yet, please send
me mail and I will add you to the group.  An OS-9 version of CUTS is
expected from the CoCo Group shortly.
 
The following is from Tim Koonce (program at the bottom of file):
For more information, contact:  koonce@bosco.Berkeley.EDU
----------------------------------------------------------------------
Description of CUTS file encoding.
 
Part 1.  Purpose
 
The CoCo Usenet Transfer System (CUTS) was designed to provide a means
of transferring Binary files and ASCII files with line lengths over 80
characters through Electronic Mail systems.  It has the following
design goals:
 
  - Efficiency.  CUTS should lengthen files as little as possible.  To
accomplish this, CUTS uses a prefix encoding where the most common
printable characters are represented as themselves.
  - Accuracy.  CUTS should accurately encode and decode files.  A
checksum is included to help insure this.
  - Robustness.  CUTS should ignore any additional lines of text that
may be added in the mail process.
  - It should be easy for the user to visually (in a mail program or
text editor) verify that the file is completely transferred.  For this
reason, CUTS enforces a uniform line length, and repeats line 0000 at
the end of the file, so that the length can be easily checked.
  - Simplicity of coding.  It should be easy to write programs to
encode and decode files.  Keeping the encoding simple will help to
encourage the popularity of the format.
 
 
Part 2.  The coding format
 
Line format.
 
A CUTS-encoded file consists of some number of lines, each 79
characters long.  The format of a single line is:
 
   Length    Description
     1        Initial period
     4        4 byte line number.
                 The lines in a CUTS file are
                 numbered consecutively, starting
                 with 0000.
     1        Period
     1        Packet type descriptor.
                 The packet type is always a
                 capital letter.
    71        Packet data
     1        Checksum
 
Note that the first 6 bytes provide a ready way to identify the lines
which are part of the CUTS file.
 
The first line of the encoded data, line 0000, is appended to the end
of the file to provide an easy, visual check that a complete transfer
has been accomplished.  This is currently the only mark of the end of
the CUTS file.
 
The packet data should only consist of printable characters which are
preserved through the common mail systems.  Note that, since some mail
links experience character set conversion, that not all printable
ASCII characters are allowable.  Which characters can be used depends
on the systems and can only be determined empirically.  The coding
described in this document has been tested on a transcontinental link
through ARPAnet between a U**X minicomputer using ASCII and an IBM
mainframe using EBCDIC.
 
The checksum is a checksum of all 78 characters, and is calculated in
ASCII as:
     checksum char = char( checksum mod 32 + 33).
 
 
Packet Types
 
There are currently two packet types in CUTS.  The first type is a
file identifier, which contains basic information about the file.  The
other is the data packet, which contains prefix-encoded binary data.
 
Identifier Packets
 
These packets are identified as packet type 'I', and have the
following format for the data field:
 
    Length    Description
      1         Period
      1         CUTS protocol version
                  This document describes CUTS
                  version 'A'.
      1         Period
      2         Two digit year.
      2         Two digit month.
      2         Two digit day.
                  These should include leading zeros
                  to fill two chars.
      1         Period
      3         File type.
                  Currently, four file types are defined:
                     ASC - any ASCII text file
                     BIN - binary image file
                     RSD - RSDOS binary files
                     OS9 - OS9 binary files
                  Program files should use the 'RSD' or
                  'OS9' file type, as appropriate.
                  Non-program binary image files should use
                  type 'BIN'.
      1         Period
     57         Filename.
                  The filename should be surrounded by quote
                  characters, and the remaining space should
                  be filled with period characters to the
                  full line length.
 
Data Packets
 
Data packets carry the actual file data.  Each data packet consists of
72 characters of prefix-encoded data.  Because the prefix-encoding
consists of both one and two character codes, it is possible that a 71
character packet will occur.  In this case, any prefix character may
be appended to the packet to fill it to 72 characters.  This isolated
prefix character carries no informational content.  Notice line 0003
in the sample file below.  Two character codes are NEVER broken
between packets.
 
 
Part 3. Prefix-encoding
 
The prefix-encoding used in CUTS assigns to each possible 8-bit byte a
distinct one or two byte coding sequence.  The most common printable
ASCII characters represent themselves.  Other values are represented
by a prefix character followed by a translated character.  In ASCII,
the prefix encoding can be succinctly described as:
 
  The ASCII characters in the ranges $2A-$5A, and $61-$7A represent
themselves as single-character codes.  Other values are represented by
a prefix character in the range $21-$29 and another character in the
range $30-$4F according to the formulas
 
   Prefix = (value div $20) + $21
   character = (value mod $20)+$30
 
  There are two special coding situations.  One is the odd prefix
character described above, used to fill a 71 character packet to 72
characters when the next value would have a 2-character encoding.  The
other is the case of the end of data.  At the end of the data, the
special prefix sequence <pound sign><period>, '#.', should be
appended, and the last packet should be padded with periods to fill
the full 72 characters.
 
 
Part 4. Implementation Notes.
 
Encoding
 
   The encoding program should allow for specification of the file
type, if it is not discernible from the file storage.  This is
especially important in operating systems such as OS9 where it may be
desirable to allow the file to be coded to be input through a pipe.
 
Decoding
 
   Note that there is no built-in limitation to having only one data
file in a CUTS encoded listing.  Each file will begin with an
appropriate identifier packet.  The decoder should be interactive, or
provide some method of gaurding against filename collisions.  It may
also be desirable to echo extraneous lines to the screen, so that the
user can read comments that may precede the encoded listing.
 
 
 
 
Part 5.  Sample CUTS-encoded file
 
Test file consisting of bytes 0 - 255 .
 
.0000.I.A.880306.BIN."TEST.BIN"...............................................=
.0001.D!0!1!2!3!4!5!6!7!8!9!:!;!<!=!>!?!@!A!B!C!D!E!F!G!H!I!J!K!L!M!N!O "1"2"3=
.0002.D"4"5"6"7"8"9*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ#K#L#M#N#O7
.0003.D$0abcdefghijklmnopqrstuvwxyz$K$L$M$N$O%0%1%2%3%4%5%6%7%8%9%:%;%<%=%>%?#F
.0004.D%@%A%B%C%D%E%F%G%H%I%J%K%L%M%N%O&0&1&2&3&4&5&6&7&8&9&:&;&<&=&>&?&@&A&B#L
.0005.D&C&D&E&F&G&H&I&J&K&L&M&N&O'0'1'2'3'4'5'6'7'8'9':';'<'='>'?'@'A'B'C'D'E#<
.0006.D'F'G'H'I'J'K'L'M'N'O(0(1(2(3(4(5(6(7(8(9(:(;(<(=(>(?(@(A(B(C(D(E(F(G(H#L
.0007.D(I(J(K(L(M(N(O#........................................................6
.0000.I.A.880306.BIN."TEST.BIN"...............................................=
 
 
 
Recognition for this project goes to:
 
- Eric Tilenius, who goaded me into doing this, and who suggested the
silly name.
 
 
                               - Tim Koonce
 
 
100 '
110 ' CUTS-format encoder/decoder for Color Computer 3
120 ' Released into Public Domain  23 March, 1988
130 ' by: Tim Koonce
140 '
150 ' The purpose of this program is to translate ascii and binary files
160 ' to/from a format which has been designed for easy transferral over
170 ' electronic mail networks.
180 '
190 '
1000 CLEAR 15000
1010 DIM CD$(255) : AC$="!"+CHR$(34)+"#$%&'("
1020 MT$="JanFebMarAprMayJunJulAugSepOctNovDec"
1030 PRINT:PRINT:PRINT:PRINT" CUTS Encoder/Decoder"
1040 PRINT:PRINT"By Tim Koonce,  April, 1988":PRINT
1050 PRINT"1)Encode a file into CUTS format"
1060 PRINT"2)Decode a CUTS file"
1070 PRINT"3)Print directory"
1080 PRINT"4)End this program"
1090 PRINT:PRINT"Select 1,2,3, or 4.":PRINT
1100 A$=INKEY$:IF A$="" THEN 1100
1110 IF A$="4" THEN END
1120 ON VAL(A$) GOSUB 3000,12000,2000
1130 GOTO 1100
2000 '
2010 'Print a directory
2020 '
2030 PRINT"Drive for directory?"
2040 A$=INKEY$:IF INSTR("0123",A$)=0 OR A$="" THEN 2040
2050 DIR VAL(A$) : PRINT"Free space: "FREE(VAL(A$))
2060 PRINT"Press any key to continue"
2070 A$=INKEY$:IF A$<>"" THEN 2070
2080 A$=INKEY$:IF A$="" THEN 2080
2090 RUN
3000 '
3010 'Encode a file
3020 '
3030 PRINT"Filename of file to encode :":LINEINPUT F$
3040 PRINT"What type of file is this?"
3050 PRINT" 1) ASCII text"
3060 PRINT" 2) RSDOS machine language program"
3070 PRINT" 3) OS9 module"
3080 PRINT" 4) Other BINary file"
3090 A$=INKEY$:IF INSTR("1234",A$)=0 OR A$=""  THEN 3090
3100 TY$=MID$("ASCRSDOS9BIN",VAL(A$)*3-2,3)
3110 PRINT"Filename of CUTS output file :";:LINEINPUT OF$
3120 FF$=OF$ : EX$=".CUT" : GOSUB 15000 : OF$=FF$
4000 '
4010 'Encode a file
4020 '
4030 GOSUB 11000
4040 OPEN"R",1,F$,1 : II=1 : FIELD #1,1 AS CH$ : IL=LOF(1)
4050 OPEN"O",2,OF$
5000 '
5010 'Output comments
5020 '
5030 PRINT"Type in comments to be inserted at beginning of file "
5040 PRINT"A blank line ends comments."
5050 PRINT" :> ";:LINEINPUT A$
5060 IF A$<>"" THEN PRINT#2,A$ : GOTO 5050
6000 '
6010 ' Construct 'I' packet
6020 '
6030 INPUT"Last two digits of Year ";YR
6040 INPUT"Number of current Month ";MN
6050 INPUT"Day of Month ";DY
6060 A$=".0000.I.A"+STRING$(69,".")
6070 MID$(A$,11,2)=RIGHT$("0"+RIGHT$(STR$(YR),LEN(STR$(YR))-1),2)
6080 MID$(A$,13,2)=RIGHT$("0"+RIGHT$(STR$(MN),LEN(STR$(MN))-1),2)
6090 MID$(A$,15,2)=RIGHT$("0"+RIGHT$(STR$(DY),LEN(STR$(DY))-1),2)
6100 MID$(A$,18,3)=TY$
6110 IF INSTR(F$,":")>0 THEN F1$=RIGHT$(F$,INSTR(F$,":")) ELSE F1$=F$
6120 MID$(A$,22,LEN(F1$)+2)=CHR$(34)+F1$+CHR$(34)
6130 GOSUB 10000:L0$=A$+CS$
6140 PRINT#2,L0$ : PRINT LEFT$(L0$,5);
7000 '
7010 'Now encode rest of file
7020 '
7030 CD$="" ' No code character to be carried over
7040 LN=1 'Start with line number 1
7050 A$=STRING$(78,".")
7060 MID$(A$,2,4)=RIGHT$("000"+RIGHT$(STR$(LN),LEN(STR$(LN))-1),4)
7070 MID$(A$,7,1)="D"
7080 AP=7
7090 MID$(A$,AP+1,LEN(CD$))=CD$ : AP=AP+LEN(CD$)
7100 GET#1,II : II=II+1
7110 CH=ASC(CH$)
7120 CD$=CD$(CH) : CL=LEN(CD$)
7130 IF AP+CL>78 THEN 7160
7140 MID$(A$,AP+1,CL)=CD$ : AP=AP+CL
7150 IF II<=IL THEN 7100 ELSE 8000
7160 IF AP<78 THEN MID$(A$,78,1)="#"
7170 GOSUB 10000 : A$=A$+CS$
7180 PRINT#2,A$:PRINT LEFT$(A$,5);
7190 LN=LN+1 : IF II<=IL THEN 7050
8000 '
8010 ' Finish off last line
8020 '
8030 MID$(A$,AP+1,2)="#."
8040 IF AP>78 THEN 8070
8050 GOSUB 10000:A$=A$+CS$
8060 PRINT#2,A$:PRINT LEFT$(A$,5); : GOTO 9000
8070 ' Add extra line with EOF marker, if needed
8080 A$=LEFT$(A$,78):GOSUB 10000:A$=A$+CS$:PRINT#2,A$:PRINT LEFT$(A$,5);
8090 A$="."+RIGHT$("000"+RIGHT$(STR$(LN+1),LEN(STR$(LN+1))-1),4)+".D"
8100 A$=A$+"#."+STRING$(69,".") : GOSUB 10000:A$=A$+CS$
8110 PRINT#2,A$ : PRINT LEFT$(A$,4);
9000 '
9010 ' Now repeat first line
9020 '
9030 PRINT#2,L0$ : PRINT LEFT$(L0$,5)
9040 CLOSE
9050 RUN
10000 '
10010 'Calculate checksum of A$ and return as CS$
10020 '
10030 CS=0
10040 FOR I=1 TO LEN(A$) : CS=CS+ASC(MID$(A$,I)) : NEXT I
10050 CS$=CHR$( (CS AND 31) + 48 )
10060 RETURN
11000 '
11010 ' Generate coding array
11020 '
11030 FOR I=0 TO 7 : FOR J=0 TO 31
11040 CD$(I*32+J)=CHR$(I+&H21)+CHR$(&H30+J)
11050 NEXT J,I
11060 FOR I=&H2A TO &H5A : CD$(I)=CHR$(I) : NEXT I
11070 FOR I=&H61 TO &H7A : CD$(I)=CHR$(I) : NEXT I
11080 CD$(32)=CHR$(32)
11090 RETURN
12000 '
12010 ' Decode a file
12020 '
12030 PRINT"Name of CUTS-encoded file to decode :":LINEINPUTF$
12040 FF$=F$:EX$=".CUT":GOSUB 15000: OPEN"I",1,FF$
12050 N0$=".0000." : L0$="" : F0=0
12060 LN=0 ' Start with Line 0
12070 LN$="."+RIGHT$("000"+MID$(STR$(LN),2),4)+"."
12080 IF EOF(1) THEN PRINT:PRINT"Warning: Unexpected end of file":CLOSE:RUN
12090 LINEINPUT#1,A$ : A$=LEFT$(A$,79)
12100 IF A$=L0$ AND F0 THEN PRINT:PRINT"Finished decoding":CLOSE:RUN
12110 IF LEFT$(A$,6)<>LN$ THEN PRINT A$:GOTO12080 ELSE PRINT LEFT$(LN$,5);
12120 IF LN=0 THEN F0=-1 : L0$=A$
12130 IF LEN(A$)>=79 THEN 12180
12140 PRINT:PRINT"Warning: Line"LN" is too short.":PRINT
12150 INPUT"Do you wish to continue (Y/N)";II$
12160 IF INSTR("Yy",LEFT$(II$,1)) THEN 12170 ELSE CLOSE: RUN
12170 A$=A$+STRING$(79-LEN(A$),".")
12180 C1$=MID$(A$,79,1)
12190 A$=LEFT$(A$,78) : GOSUB 10000
12200 IF C1$=CS$ THEN 12240
12210 PRINT:PRINT"Warning: Line"LN" has a bad checksum.":PRINT
12220 INPUT"Do you wish to continue (Y/N)";II$
12230 IF INSTR("Yy",LEFT$(II$,1)) THEN 12240 ELSE CLOSE : RUN
12240 PK$=MID$(A$,7) 'Separate Packet
12250 IF LEFT$(PK$,1)="I" THEN GOSUB 13000 : LN=LN+1 : GOTO 12070
12260 IF LEFT$(PK$,1)="D" THEN 12290
12270 PRINT"Unrecognized Packet Type "LEFT$(PK$,1)" in Line"LN
12280 LN=LN+1 : GOTO 12070
12290 PK$=MID$(PK$,2)
12300 IF LEN(PK$)>0 THEN GOSUB 14000 : GOTO 12300
12310 LN=LN+1 : GOTO 12070
13000 '
13010 ' Handle "I" packets
13020 '
13030 PRINT:PRINT"File encoded by CUTS version "MID$(PK$,3,1)
13040 PRINT"Encoded on: "MID$(PK$,9,2)" ";
13050 PRINT MID$(MT$,VAL(MID$(PK$,7,2))*3-2,3)" "MID$(PK$,5,2)
13060 TY$=MID$(PK$,12,3)
13070 PRINT"Filetype :"TY$
13080 F$=MID$(PK$,17) : F$=LEFT$(F$,INSTR(F$,CHR$(34))-1)
13090 PRINT"Stored Filename is :"F$
13100 PRINT"Filename to store file under"
13110 PRINT" (<ENTER> for same filename):"
13120 LINEINPUT F1$ : IF F1$<>"" THEN F$=F1$
13130 FF$=F$ : EX$="." : GOSUB 15000 : F$=FF$
13160 PRINT"Saving decoded CUTS file to "F$
13170 IF TY$="BIN" OR TY$="OS9" OR TY$="RSD" THEN SAVEM F$,0,0,0
13180 OPEN"R",2,F$,1 : II=1 : FIELD #2, 1 AS CH$
13190 RETURN
14000 '
14010 ' Decode first sequence in PK$ and output to file #2.
14020 '
14030 IF LEN(PK$)>1 THEN 14060
14040 IF INSTR(AC$,PK$) THEN PK$="" : RETURN
14050 LSET CH$=PK$ : PUT#2,II : II=II+1 : PK$="" : RETURN
14060 C1$=LEFT$(PK$,1) : C2$=MID$(PK$,2,1)
14070 IF INSTR(AC$,C1$) THEN 14090
14080 PK$=MID$(PK$,2) : LSET CH$=C1$ : PUT#2,II : II=II+1 : RETURN
14090 IF ASC(C2$)<&H30 OR ASC(C2$)>&H4F THEN 14120
14100 LSET CH$=CHR$(ASC(C2$)-&H30+32*(INSTR(AC$,C1$)-1))
14110 PK$=MID$(PK$,3) : PUT#2,II : II=II+1 : RETURN
14120 CD$=C1$+C2$
14130 IF CD$="#." THEN CLOSE 2 : PK$="" : RETURN
14140 PRINT"Unrecognized code sequence "CHR$(34)CD$CHR$(34)
14150 PK$=MID$(PK$,3)
14160 RETURN
15000 '
15010 ' Add extension to filename
15020 '
15030 IF INSTR(FF$,".")<>0 OR INSTR(FF$,"/")<>0 THEN 15060
15040 IF INSTR(FF$,":")=0 THEN FF$=FF$+EX$ : GOTO 15060
15050 FF$=LEFT$(FF$,INSTR(FF$,":")-1)+EX$+MID$(FF$,INSTR(FF$,":"))
15060 RETURN
 
* END OF PROGRAM *
 
-- Join the CoCo Mail Group today!  Send me mail with a return path if
you'd like to join us!
 
- ERIC -
 
*----------------------===>  SPACE IS THE PLACE... <===-----------------------*
*        ewtileni@pucc.Princeton.EDU  //  ewtileni@pucc.BITNET                *
*      rutgers!pucc.bitnet!ewtileni  //  princeton!pucc.bitnet!ewtileni       *
* ColorVenture - Microcomputer Software - "Because Life isn't Black and White"*
*--------------------===> Another proud CoCo 3 owner <===---------------------*