jhs@MITRE-BEDFORD.ARPA (05/21/87)
Hot off the presses, here is a user manual for the UUDECODE program in BASIC which I posted a while ago. It is set up for printing on a 12-pitch (80-column) printer. Comments on the manual will be appreciated and I will try to incorporate them into a future release of it. Many thanks to Mark Pierce for his initial comments, which resulted in a significant improvement in this version. If anyone needs the program listing and/or the assembly language subroutine source listing, let me know and I will either post it or e-mail it to individual requestors. -John Sangster / jhs@mitre-bedford.arpa ----------------------------c-u-t---h-e-r-e----------------------------------- U S E R M A N U A L f o r BASIC/Machine Language UUDECODER Version 1.2a Manual Revision Date: 5/20/87 by John H. Sangster jhs@mitre-bedford.arpa (617) 235-8753 (home) (617) 271-2000 (work) 1. PURPOSE: Uudecode is the decoder for uuencode, a program widely used on unix systems to encode binary files into printing ASCII characters for transmission over networks. Because many network components interpret and respond to special "control" characters, attempts to send binary files such as machine-language programs over a network in raw, un-encoded form are usually doomed to failure. Uuencode has become popular for encoding because it is fairly efficient: it encodes each group of three 8-bit bytes into only four bytes of printable characters. Therefore, using uuencode, a file is expanded by a factor of only 1.333 to one as the price of constraining the character set to be limited to printing characters only. 2. USING UUDECODE VER. 1.2a: UUDECODE VER. 1.2a is normally sent by e-mail as an Atari BASIC ".LST" file. This means that it consists of only printing characters itself, and can be mailed, printed, etc. without difficulty. On the other hand, the first time you run it, you will have to "ENTER" it into BASIC with a command like 'ENTER "D:UUDECODE.LST"<RETURN>'. Then you should SAVE it with the command 'SAVE "D:UUDECODE.BAS"<RETURN>'. Thereafter, you can run it under BASIC with the RUN command instead of the ENTER command. You will immediately notice that RUN works significantly faster. This is true because SAVED files are in tokenized form. You can delete the .LST file if you wish, because it can always be reconstructed using the LIST command, but you should be sure to keep a backup copy of UUDECODE in some form on a separate disk from the one you normally run it from. Keeping the .LST file around is handy if you decide to e-mail it to someone else someday. Once you have UUDECODE running, it will ask you for the input file you wish to uudecode and the filename you want the output to be sent to. You should give it the exact input file specification you want it to use, i.e. including device, filename, and extension. Normally, uuencoded files are given the extenion ".UUE", but you will have to tell the program that. On the output file specification, however, the program will accept either a null input (just a <RETURN>) or a device specifier like D1: without a filename, or you can again specify the full filename and extension. In the first two cases, it will attempt to read the filename from the "begin" line of the uuencoded input file. If you don't even specify a device name, it will assume "D1:" and will so inform you. If it cannot successfully OPEN the file it thinks you want, it will give you another chance to enter the output filename. Before it gets to this point, however, it will have printed the "begin" line if one was found, so you may have a clue as to what went wrong, e.g. a filename was given which is not valid for your DOS. In that case, you should think up a valid filename that seems suitable. After finishing a file, UUDECODE will print "Done!" and will ask if you want to decode another file. If you say "Y" or "yes", it will re-initialize and allow you to enter the new input and output filenames. Otherwise, it will exit to DOS. NOTE: UUDECODE Ver. 1.2a also prints a byte count when it finishes. You should note this count and compare it with the correct byte count as established when the file was being uuencoded. That's about all there is to it. Comments on this manual, as well as on any difficulties experienced with UUDECODE, should be directed to the author. 3. WHAT UUENCODE AND UUDECODE REALLY DO TO THE DATA: The uuencoding process, used to create ASCII files of the type that UUDECODE is designed to decode, is easy to describe. Each group of three input bytes, with any 8-bit pattern whatever in them, is broken up into four sets of six bits. (Note that three times 8 and four times 6 each give exactly 24 bits, so no information is lost.) Each 6-bit pattern is put in the low-order 6 bits of an 8-bit byte, and decimal 32 is added to give a final value in the range 32 (ASCII blank) through 95. All the characters in this range (decimal 32 through 95) are printing characters. Traditionally, uuencode programs take 45 bytes at a time from the input file as long as bytes are available, and encode them into 60 output bytes which are sent as one "line". Each line is made up as an encoded byte count, the 60 encoded bytes of data, and an end-of-line character or characters. The encoded byte count is the actual number of input bytes encoded on that line, plus decimal 32. For all lines but the last, this gives 45 plus 32, or 77, which translates into ASCII uppercase "M". That is why all lines but the last in a uuencoded file begin with "M". The final line of data begins with a character which is between "blank" (32) and "M" (77) in the ASCII "collating sequence". The exact value depends on how many bytes were left in the input file after the last full line of 45 was used up. Uuencode programs usually sandwich these lines of data between a "begin" line and an "end" sequence. The "begin" line consists of the word "begin", a single space, a 3-digit "protection code" as used by unix systems, another single space, and finally the filename which should be used for the file into which the data is decoded at the far end. The "end" sequence is supposed to consist of a line containing a blank in the first character position, denoting zero encoded bytes on that line, followed by a line beginning with the word "end". Some unix-based uudecoders seem to require additional blanks following these minimum fields. Uudecode is supposed to be the exact inverse of uuencode, i.e. after decoding a uuencoded file, you should have the exact binary file that was originally encoded. This is essential, because the whole purpose of uuencode and uudecode is to let you transmit machine language object programs around on networks. If even one bit is changed, all bets are off! The basic idea of uudecode is therefore to take each line in, subtract 32 from each of the 60 or fewer bytes, pack each group of 6 bits in the low-order portion of each byte back into 8-bit packed binary form, and write the re-packed bytes out to a binary output file. 4. PITFALLS: Unfortunately, as uuencode has been adopted wholesale for use in transmitting binary files across networks, it has turned out that not all network hosts are as careful what they do with files as are most unix hosts. IBM hosts are among the most notoriously callous about changing byte values to suit their own preconceived notions. Most of the time, the changes consist of things like stripping off trailing blanks on lines that happen to end in a blank. This can be embarrassing if not handled properly by the decoder. Another favorite trick is to change carat into accent grave, or tilde into carat, or what-have-you. The only reliable way to handle this sort of problem seems to be for the uuencode program to send an encoding translation table at the beginning, which lists all the output characters from decimal 32 to 95, and for the uudecode program to capture the values received in their place and decode accordingly. If they have not been mapped one-to-one, of course it can only throw up its hands in dismay and so inform you. 5. CAPABILITIES OF UUDECODE VER. 1.2a: UUDECODE Ver. 1.2a as described by this manual includes masking to correct the most common character translation problems, i.e. those which result in encoded values greater than 95. It correctly decodes files in which a "sentinel" character (usually "a" or "x") has been added to the end of the uuencoded lines to prevent stripping of trailing blanks, as well as files in which trailing blanks have actually been stripped! It does NOT process the "translation table" preamble added by some uuencodes; this may be added in a future revision. This version also is capable of reading the output filename from the uuencoded file. This will be done if you specify only a DEVICE NAME, e.g. "D1:", or respond with just a <RETURN> to the output file prompt. Finally, Ver. 1.2a is smart enough to ignore extraneous lines in the input file either before the "begin" line or after the "end" line. This means that it can be used to decode uuencoded files which are preceded by explanatory comments in an e-mail message. On the other hand, this version can only decode ONE uuencoded file per input file. If you receive a message containing multiple files, you will have to break up the input file into separate files for uudecoding. A future release may allow more flexibility. UUDECODE Ver. 1.2a is fairly fast. An assembly-language subroutine is used to do the "bit-picking" dirty work. This routine is quite efficient, despite the fact that it includes such conveniences as checking the DIMensioning of the string used as the output buffer and setting its LENgth parameter correctly. These features cost a small amount in assembly language but they save the BASIC calling program from having to worry about such details, which would be far more costly to implement in BASIC. The main program of UUDECODE.LST is also optimized for speed, mainly by putting the inner loop "up front", where BASIC doesn't have to search very far for statement numbers, and by keeping the loop short, especially on the most frequently used path. To give you an idea of just HOW fast this program is, a fairly knowledgeable programmer coded uudecode up in C in an effort to get better performance and then noted afterward that Ver. 1.2a in BASIC and assembly language ran approximately FIVE TIMES as fast as his C version! I think you will find Ver. 1.2a highly satisfactory with regard to execution speed.