EWTILENI@pucc.Princeton.EDU (Eric Tilenius) (03/25/88)
Below is a description of the CUTS file encoding system, using on the LISTSERV server at pucc.Princeton.EDU/pucc.BITNET. Following that is the CUTS decoder itself. This is a system we worked out to allow ASCII files with lengths greater than 80 and BINARY files to flow easily over the mail gateways. It is an excellent method of posting software to LISTSERV or this group. Currently, at the CoCo Mailing list, we have a good deal of CoCo software being put up in CUTS format. If you haven't joined us yet, please send me mail and I will add you to the group. An OS-9 version of CUTS is expected from the CoCo Group shortly. The following is from Tim Koonce (program at the bottom of file): For more information, contact: koonce@bosco.Berkeley.EDU ---------------------------------------------------------------------- Description of CUTS file encoding. Part 1. Purpose The CoCo Usenet Transfer System (CUTS) was designed to provide a means of transferring Binary files and ASCII files with line lengths over 80 characters through Electronic Mail systems. It has the following design goals: - Efficiency. CUTS should lengthen files as little as possible. To accomplish this, CUTS uses a prefix encoding where the most common printable characters are represented as themselves. - Accuracy. CUTS should accurately encode and decode files. A checksum is included to help insure this. - Robustness. CUTS should ignore any additional lines of text that may be added in the mail process. - It should be easy for the user to visually (in a mail program or text editor) verify that the file is completely transferred. For this reason, CUTS enforces a uniform line length, and repeats line 0000 at the end of the file, so that the length can be easily checked. - Simplicity of coding. It should be easy to write programs to encode and decode files. Keeping the encoding simple will help to encourage the popularity of the format. Part 2. The coding format Line format. A CUTS-encoded file consists of some number of lines, each 79 characters long. The format of a single line is: Length Description 1 Initial period 4 4 byte line number. The lines in a CUTS file are numbered consecutively, starting with 0000. 1 Period 1 Packet type descriptor. The packet type is always a capital letter. 71 Packet data 1 Checksum Note that the first 6 bytes provide a ready way to identify the lines which are part of the CUTS file. The first line of the encoded data, line 0000, is appended to the end of the file to provide an easy, visual check that a complete transfer has been accomplished. This is currently the only mark of the end of the CUTS file. The packet data should only consist of printable characters which are preserved through the common mail systems. Note that, since some mail links experience character set conversion, that not all printable ASCII characters are allowable. Which characters can be used depends on the systems and can only be determined empirically. The coding described in this document has been tested on a transcontinental link through ARPAnet between a U**X minicomputer using ASCII and an IBM mainframe using EBCDIC. The checksum is a checksum of all 78 characters, and is calculated in ASCII as: checksum char = char( checksum mod 32 + 33). Packet Types There are currently two packet types in CUTS. The first type is a file identifier, which contains basic information about the file. The other is the data packet, which contains prefix-encoded binary data. Identifier Packets These packets are identified as packet type 'I', and have the following format for the data field: Length Description 1 Period 1 CUTS protocol version This document describes CUTS version 'A'. 1 Period 2 Two digit year. 2 Two digit month. 2 Two digit day. These should include leading zeros to fill two chars. 1 Period 3 File type. Currently, four file types are defined: ASC - any ASCII text file BIN - binary image file RSD - RSDOS binary files OS9 - OS9 binary files Program files should use the 'RSD' or 'OS9' file type, as appropriate. Non-program binary image files should use type 'BIN'. 1 Period 57 Filename. The filename should be surrounded by quote characters, and the remaining space should be filled with period characters to the full line length. Data Packets Data packets carry the actual file data. Each data packet consists of 72 characters of prefix-encoded data. Because the prefix-encoding consists of both one and two character codes, it is possible that a 71 character packet will occur. In this case, any prefix character may be appended to the packet to fill it to 72 characters. This isolated prefix character carries no informational content. Notice line 0003 in the sample file below. Two character codes are NEVER broken between packets. Part 3. Prefix-encoding The prefix-encoding used in CUTS assigns to each possible 8-bit byte a distinct one or two byte coding sequence. The most common printable ASCII characters represent themselves. Other values are represented by a prefix character followed by a translated character. In ASCII, the prefix encoding can be succinctly described as: The ASCII characters in the ranges $2A-$5A, and $61-$7A represent themselves as single-character codes. Other values are represented by a prefix character in the range $21-$29 and another character in the range $30-$4F according to the formulas Prefix = (value div $20) + $21 character = (value mod $20)+$30 There are two special coding situations. One is the odd prefix character described above, used to fill a 71 character packet to 72 characters when the next value would have a 2-character encoding. The other is the case of the end of data. At the end of the data, the special prefix sequence <pound sign><period>, '#.', should be appended, and the last packet should be padded with periods to fill the full 72 characters. Part 4. Implementation Notes. Encoding The encoding program should allow for specification of the file type, if it is not discernible from the file storage. This is especially important in operating systems such as OS9 where it may be desirable to allow the file to be coded to be input through a pipe. Decoding Note that there is no built-in limitation to having only one data file in a CUTS encoded listing. Each file will begin with an appropriate identifier packet. The decoder should be interactive, or provide some method of gaurding against filename collisions. It may also be desirable to echo extraneous lines to the screen, so that the user can read comments that may precede the encoded listing. Part 5. Sample CUTS-encoded file Test file consisting of bytes 0 - 255 . .0000.I.A.880306.BIN."TEST.BIN"...............................................= .0001.D!0!1!2!3!4!5!6!7!8!9!:!;!<!=!>!?!@!A!B!C!D!E!F!G!H!I!J!K!L!M!N!O "1"2"3= .0002.D"4"5"6"7"8"9*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ#K#L#M#N#O7 .0003.D$0abcdefghijklmnopqrstuvwxyz$K$L$M$N$O%0%1%2%3%4%5%6%7%8%9%:%;%<%=%>%?#F .0004.D%@%A%B%C%D%E%F%G%H%I%J%K%L%M%N%O&0&1&2&3&4&5&6&7&8&9&:&;&<&=&>&?&@&A&B#L .0005.D&C&D&E&F&G&H&I&J&K&L&M&N&O'0'1'2'3'4'5'6'7'8'9':';'<'='>'?'@'A'B'C'D'E#< .0006.D'F'G'H'I'J'K'L'M'N'O(0(1(2(3(4(5(6(7(8(9(:(;(<(=(>(?(@(A(B(C(D(E(F(G(H#L .0007.D(I(J(K(L(M(N(O#........................................................6 .0000.I.A.880306.BIN."TEST.BIN"...............................................= Recognition for this project goes to: - Eric Tilenius, who goaded me into doing this, and who suggested the silly name. - Tim Koonce 100 ' 110 ' CUTS-format encoder/decoder for Color Computer 3 120 ' Released into Public Domain 23 March, 1988 130 ' by: Tim Koonce 140 ' 150 ' The purpose of this program is to translate ascii and binary files 160 ' to/from a format which has been designed for easy transferral over 170 ' electronic mail networks. 180 ' 190 ' 1000 CLEAR 15000 1010 DIM CD$(255) : AC$="!"+CHR$(34)+"#$%&'(" 1020 MT$="JanFebMarAprMayJunJulAugSepOctNovDec" 1030 PRINT:PRINT:PRINT:PRINT" CUTS Encoder/Decoder" 1040 PRINT:PRINT"By Tim Koonce, April, 1988":PRINT 1050 PRINT"1)Encode a file into CUTS format" 1060 PRINT"2)Decode a CUTS file" 1070 PRINT"3)Print directory" 1080 PRINT"4)End this program" 1090 PRINT:PRINT"Select 1,2,3, or 4.":PRINT 1100 A$=INKEY$:IF A$="" THEN 1100 1110 IF A$="4" THEN END 1120 ON VAL(A$) GOSUB 3000,12000,2000 1130 GOTO 1100 2000 ' 2010 'Print a directory 2020 ' 2030 PRINT"Drive for directory?" 2040 A$=INKEY$:IF INSTR("0123",A$)=0 OR A$="" THEN 2040 2050 DIR VAL(A$) : PRINT"Free space: "FREE(VAL(A$)) 2060 PRINT"Press any key to continue" 2070 A$=INKEY$:IF A$<>"" THEN 2070 2080 A$=INKEY$:IF A$="" THEN 2080 2090 RUN 3000 ' 3010 'Encode a file 3020 ' 3030 PRINT"Filename of file to encode :":LINEINPUT F$ 3040 PRINT"What type of file is this?" 3050 PRINT" 1) ASCII text" 3060 PRINT" 2) RSDOS machine language program" 3070 PRINT" 3) OS9 module" 3080 PRINT" 4) Other BINary file" 3090 A$=INKEY$:IF INSTR("1234",A$)=0 OR A$="" THEN 3090 3100 TY$=MID$("ASCRSDOS9BIN",VAL(A$)*3-2,3) 3110 PRINT"Filename of CUTS output file :";:LINEINPUT OF$ 3120 FF$=OF$ : EX$=".CUT" : GOSUB 15000 : OF$=FF$ 4000 ' 4010 'Encode a file 4020 ' 4030 GOSUB 11000 4040 OPEN"R",1,F$,1 : II=1 : FIELD #1,1 AS CH$ : IL=LOF(1) 4050 OPEN"O",2,OF$ 5000 ' 5010 'Output comments 5020 ' 5030 PRINT"Type in comments to be inserted at beginning of file " 5040 PRINT"A blank line ends comments." 5050 PRINT" :> ";:LINEINPUT A$ 5060 IF A$<>"" THEN PRINT#2,A$ : GOTO 5050 6000 ' 6010 ' Construct 'I' packet 6020 ' 6030 INPUT"Last two digits of Year ";YR 6040 INPUT"Number of current Month ";MN 6050 INPUT"Day of Month ";DY 6060 A$=".0000.I.A"+STRING$(69,".") 6070 MID$(A$,11,2)=RIGHT$("0"+RIGHT$(STR$(YR),LEN(STR$(YR))-1),2) 6080 MID$(A$,13,2)=RIGHT$("0"+RIGHT$(STR$(MN),LEN(STR$(MN))-1),2) 6090 MID$(A$,15,2)=RIGHT$("0"+RIGHT$(STR$(DY),LEN(STR$(DY))-1),2) 6100 MID$(A$,18,3)=TY$ 6110 IF INSTR(F$,":")>0 THEN F1$=RIGHT$(F$,INSTR(F$,":")) ELSE F1$=F$ 6120 MID$(A$,22,LEN(F1$)+2)=CHR$(34)+F1$+CHR$(34) 6130 GOSUB 10000:L0$=A$+CS$ 6140 PRINT#2,L0$ : PRINT LEFT$(L0$,5); 7000 ' 7010 'Now encode rest of file 7020 ' 7030 CD$="" ' No code character to be carried over 7040 LN=1 'Start with line number 1 7050 A$=STRING$(78,".") 7060 MID$(A$,2,4)=RIGHT$("000"+RIGHT$(STR$(LN),LEN(STR$(LN))-1),4) 7070 MID$(A$,7,1)="D" 7080 AP=7 7090 MID$(A$,AP+1,LEN(CD$))=CD$ : AP=AP+LEN(CD$) 7100 GET#1,II : II=II+1 7110 CH=ASC(CH$) 7120 CD$=CD$(CH) : CL=LEN(CD$) 7130 IF AP+CL>78 THEN 7160 7140 MID$(A$,AP+1,CL)=CD$ : AP=AP+CL 7150 IF II<=IL THEN 7100 ELSE 8000 7160 IF AP<78 THEN MID$(A$,78,1)="#" 7170 GOSUB 10000 : A$=A$+CS$ 7180 PRINT#2,A$:PRINT LEFT$(A$,5); 7190 LN=LN+1 : IF II<=IL THEN 7050 8000 ' 8010 ' Finish off last line 8020 ' 8030 MID$(A$,AP+1,2)="#." 8040 IF AP>78 THEN 8070 8050 GOSUB 10000:A$=A$+CS$ 8060 PRINT#2,A$:PRINT LEFT$(A$,5); : GOTO 9000 8070 ' Add extra line with EOF marker, if needed 8080 A$=LEFT$(A$,78):GOSUB 10000:A$=A$+CS$:PRINT#2,A$:PRINT LEFT$(A$,5); 8090 A$="."+RIGHT$("000"+RIGHT$(STR$(LN+1),LEN(STR$(LN+1))-1),4)+".D" 8100 A$=A$+"#."+STRING$(69,".") : GOSUB 10000:A$=A$+CS$ 8110 PRINT#2,A$ : PRINT LEFT$(A$,4); 9000 ' 9010 ' Now repeat first line 9020 ' 9030 PRINT#2,L0$ : PRINT LEFT$(L0$,5) 9040 CLOSE 9050 RUN 10000 ' 10010 'Calculate checksum of A$ and return as CS$ 10020 ' 10030 CS=0 10040 FOR I=1 TO LEN(A$) : CS=CS+ASC(MID$(A$,I)) : NEXT I 10050 CS$=CHR$( (CS AND 31) + 48 ) 10060 RETURN 11000 ' 11010 ' Generate coding array 11020 ' 11030 FOR I=0 TO 7 : FOR J=0 TO 31 11040 CD$(I*32+J)=CHR$(I+&H21)+CHR$(&H30+J) 11050 NEXT J,I 11060 FOR I=&H2A TO &H5A : CD$(I)=CHR$(I) : NEXT I 11070 FOR I=&H61 TO &H7A : CD$(I)=CHR$(I) : NEXT I 11080 CD$(32)=CHR$(32) 11090 RETURN 12000 ' 12010 ' Decode a file 12020 ' 12030 PRINT"Name of CUTS-encoded file to decode :":LINEINPUTF$ 12040 FF$=F$:EX$=".CUT":GOSUB 15000: OPEN"I",1,FF$ 12050 N0$=".0000." : L0$="" : F0=0 12060 LN=0 ' Start with Line 0 12070 LN$="."+RIGHT$("000"+MID$(STR$(LN),2),4)+"." 12080 IF EOF(1) THEN PRINT:PRINT"Warning: Unexpected end of file":CLOSE:RUN 12090 LINEINPUT#1,A$ : A$=LEFT$(A$,79) 12100 IF A$=L0$ AND F0 THEN PRINT:PRINT"Finished decoding":CLOSE:RUN 12110 IF LEFT$(A$,6)<>LN$ THEN PRINT A$:GOTO12080 ELSE PRINT LEFT$(LN$,5); 12120 IF LN=0 THEN F0=-1 : L0$=A$ 12130 IF LEN(A$)>=79 THEN 12180 12140 PRINT:PRINT"Warning: Line"LN" is too short.":PRINT 12150 INPUT"Do you wish to continue (Y/N)";II$ 12160 IF INSTR("Yy",LEFT$(II$,1)) THEN 12170 ELSE CLOSE: RUN 12170 A$=A$+STRING$(79-LEN(A$),".") 12180 C1$=MID$(A$,79,1) 12190 A$=LEFT$(A$,78) : GOSUB 10000 12200 IF C1$=CS$ THEN 12240 12210 PRINT:PRINT"Warning: Line"LN" has a bad checksum.":PRINT 12220 INPUT"Do you wish to continue (Y/N)";II$ 12230 IF INSTR("Yy",LEFT$(II$,1)) THEN 12240 ELSE CLOSE : RUN 12240 PK$=MID$(A$,7) 'Separate Packet 12250 IF LEFT$(PK$,1)="I" THEN GOSUB 13000 : LN=LN+1 : GOTO 12070 12260 IF LEFT$(PK$,1)="D" THEN 12290 12270 PRINT"Unrecognized Packet Type "LEFT$(PK$,1)" in Line"LN 12280 LN=LN+1 : GOTO 12070 12290 PK$=MID$(PK$,2) 12300 IF LEN(PK$)>0 THEN GOSUB 14000 : GOTO 12300 12310 LN=LN+1 : GOTO 12070 13000 ' 13010 ' Handle "I" packets 13020 ' 13030 PRINT:PRINT"File encoded by CUTS version "MID$(PK$,3,1) 13040 PRINT"Encoded on: "MID$(PK$,9,2)" "; 13050 PRINT MID$(MT$,VAL(MID$(PK$,7,2))*3-2,3)" "MID$(PK$,5,2) 13060 TY$=MID$(PK$,12,3) 13070 PRINT"Filetype :"TY$ 13080 F$=MID$(PK$,17) : F$=LEFT$(F$,INSTR(F$,CHR$(34))-1) 13090 PRINT"Stored Filename is :"F$ 13100 PRINT"Filename to store file under" 13110 PRINT" (<ENTER> for same filename):" 13120 LINEINPUT F1$ : IF F1$<>"" THEN F$=F1$ 13130 FF$=F$ : EX$="." : GOSUB 15000 : F$=FF$ 13160 PRINT"Saving decoded CUTS file to "F$ 13170 IF TY$="BIN" OR TY$="OS9" OR TY$="RSD" THEN SAVEM F$,0,0,0 13180 OPEN"R",2,F$,1 : II=1 : FIELD #2, 1 AS CH$ 13190 RETURN 14000 ' 14010 ' Decode first sequence in PK$ and output to file #2. 14020 ' 14030 IF LEN(PK$)>1 THEN 14060 14040 IF INSTR(AC$,PK$) THEN PK$="" : RETURN 14050 LSET CH$=PK$ : PUT#2,II : II=II+1 : PK$="" : RETURN 14060 C1$=LEFT$(PK$,1) : C2$=MID$(PK$,2,1) 14070 IF INSTR(AC$,C1$) THEN 14090 14080 PK$=MID$(PK$,2) : LSET CH$=C1$ : PUT#2,II : II=II+1 : RETURN 14090 IF ASC(C2$)<&H30 OR ASC(C2$)>&H4F THEN 14120 14100 LSET CH$=CHR$(ASC(C2$)-&H30+32*(INSTR(AC$,C1$)-1)) 14110 PK$=MID$(PK$,3) : PUT#2,II : II=II+1 : RETURN 14120 CD$=C1$+C2$ 14130 IF CD$="#." THEN CLOSE 2 : PK$="" : RETURN 14140 PRINT"Unrecognized code sequence "CHR$(34)CD$CHR$(34) 14150 PK$=MID$(PK$,3) 14160 RETURN 15000 ' 15010 ' Add extension to filename 15020 ' 15030 IF INSTR(FF$,".")<>0 OR INSTR(FF$,"/")<>0 THEN 15060 15040 IF INSTR(FF$,":")=0 THEN FF$=FF$+EX$ : GOTO 15060 15050 FF$=LEFT$(FF$,INSTR(FF$,":")-1)+EX$+MID$(FF$,INSTR(FF$,":")) 15060 RETURN * END OF PROGRAM * -- Join the CoCo Mail Group today! Send me mail with a return path if you'd like to join us! - ERIC - *----------------------===> SPACE IS THE PLACE... <===-----------------------* * ewtileni@pucc.Princeton.EDU // ewtileni@pucc.BITNET * * rutgers!pucc.bitnet!ewtileni // princeton!pucc.bitnet!ewtileni * * ColorVenture - Microcomputer Software - "Because Life isn't Black and White"* *--------------------===> Another proud CoCo 3 owner <===---------------------*