[comp.sys.atari.8bit] uuendecode/hexbin for the 8-bits...

btb@ncoast.UUCP (01/29/87)

I have d/l'd the hexbin and binhex programs, and they work, but, as we all
know, they are very slow, and the binhex process doubles the size of your
file for transmitting...

The uudecode program has apparently been causing some problems, too... I have
d/l'd it, but I haven't tried it yet...

I would like to propose that if somebody would post the uudecode algorithms,
I will write uudecode/encode in Deep Blue C... binhex it to the net, so
everyone can get it... this way at least it will be fairly fast, and I
could also make the DBC source available for others to monkey with...

Please send the algorithm description to me... Thanks.


-- 
			Brad Banko
			...!decvax!cwruecmp!ncoast!btb
			Cleveland, Ohio

"The only thing we have to fear on this planet is man."
			-- Carl Jung, 1875-1961

jhs@MITRE-BEDFORD.ARPA.UUCP (01/30/87)

Brad:

Attached is the uuencode/decode definition and a c program for it.

I have fixed the uudecode that I posted so I think it now will work correctly.
This version is in BASIC, so even those who don't have a c compiler can use
it.  Also, the inner workings are in machine language, so it is already fairly
fast, possibly faster than your c version will be.

I am (slowly) working on a version which does encoding as well.  It may be
a couple of months until I finish it, though, at the present rate of
distractions.

Uuencoding has its problems.  Some of the characters in the character set get
changed by some hosts.  Lines ending in blanks sometimes get shortened in
transit.  I have a file of reports of such problems which you should probably
read if you are serious about doing another version.

-John Sangster
jhs@mitre-bedford.arpa


----------------i-n-f-o---o-n---u-u-e-n-c-o-d-e-/-d-e-c-o-d-e-----------------
From: randy@NLM-VAX.arpa (Rand Huntzinger)
Organization: National Library of Medicine, Bethesda, Md
------------
Uudecode reads a file of the following format:

header line(1)->begin <mode> <filename>
data lines(many)-><length> <data>
trailer line(1)->end

where:
The header line fields contain:

<mode> is a three digit number specifying a Unix file mode (specifies
who can read and write the file.  You can ignore this on a
non-Unix machine.

<filename> is the name of the file encoded below.  You can use this
if it is compatable with file names on your system.


The data lines contain:

<length> is one character generated by adding the number of bytes
encoded on this line to 32 (ASCII space).

<data> is the <length>-32 bytes of binary data encoded into text
to produce 4 bytes of text for every 3 bytes of binary
data as follows:

Input:    Byte 1      Byte 2      Byte 3

    7  6  5  4  3  2  1  0    7  6  5  4  3  2  1  0    7  6  5  4  3  2  1  0
    \              /  \                /  \                /  \              /
     \            /    \              /    \              /    \            /
      ----- + ----      ----- + ------      ------ + -----      ----- + ----
    |                 |                    |                  |

  + 32               + 32                 + 32               + 32

    |                 |                    |                  |
    V                 V                    V                  V

Output:  Byte 1    Byte 2               Byte 3             Byte 4


In other words, encoding involves taking 3 bytes, breaking it
into 4 six bit chunks, adding 32 to each six bit value to make
it an ascii character, and output them.

To decode, you reverse this.  Strip the parity bit, subtract
32 from each byte and repack into three words by shifting and
or'ing the pieces.  I don't think I'll spell this out, since
it is an obvious reversal of the above steps.


The last line in the section simply says 'end'.

The file may contain junk before the begin statement and after the end
statement.  So the decoder usually skips until it sees begin, extracts
the file name and decodes until it sees the end line.  Be sure when you
implement it that you use the length byte to determine the length of the
decoded text, since stuff going though news sometimes gets padding added
to the text.  Also, you need to do this is you want to get the file
length correct.

I've included a posted uudecode source for the Atari 520 ST below.  I've
never used it, but it does give you something to look at.  I don't know
whether you read C, but it might still be of help.  There is nothing which
says it's copyrighted, so I assume it's public domain.  If not, the
copyright was removed before I saw it.  Some of the cruft in it indicates
it was originally written for Unix.  I seem to have lost the credits on
this one, I don't see who posted it.

===========================================================================


/*
 * uudecode input
 * Modified for the ST - cannot use putc because of CR/LF problems
 *
 * create the specified file, decoding as you go.
 * used with uuencode.
 */
#include <stdio.h>
#include <osbind.h>

int_isconio;
#define NULL 0

/* single character decode */
#define DEC(c)(((c) - ' ') & 077)

intoutfile;/* File descriptor of output file */

charoutbuf[BUFSIZ];/* Output buffer for my character out code */
char*nextc= outbuf;/* Pointer to the next character */
#define lputc(outchar){*nextc++ = outchar;if (nextc >= &outbuf[BUFSIZ])do_write();}

main(argc, argv)
char **argv;
{
FILE *in;
FILE *fopen();
int mode;
char dest[128];
char buf[80];
_isconio = 1;
/* mandatory input arg */
if (argc != 2) {
printf("Usage: uudec filename\n");
exit(1);
}
if ((in = fopen(argv[1], "r")) == NULL) {
printf("Cannot open: %s\n", argv[1]);
exit(1);
}
_isconio = 0;
/* search for header line */
for (;;) {
if (fgets(buf, sizeof buf, in) == NULL) {
printf("No begin line\n");
exit(3);
}
if (strncmp(buf, "begin ", 6) == 0)
break;
}
sscanf(buf, "begin %o %s", &mode, dest);

/* handle ~user/file format */
if (dest[0] == '~') {
printf("Cannot handle user formats\n");
exit(1);
}
/* create output file */
if ((outfile = Fcreate(dest, 0)) < 0)
{printf("Cannot create: %s\n", argv[2]);
exit(1);
}

decode(in);

if (fgets(buf, sizeof buf, in) == NULL || strcmp(buf, "end\n")) {
printf("No end line\n");
exit(5);
}
exit(0);
}

/*
 * copy from in to outfile, decoding as you go along.
 */
decode(in)
FILE *in;
{
char buf[80];
char *bp;
int n;

for (;;) {
/* for each input line */
if (fgets(buf, sizeof buf, in) == NULL) {
printf("Short file\n");
exit(10);
}
n = DEC(buf[0]);
if (n <= 0)
break;

bp = &buf[1];
while (n > 0) {
outdec(bp, n);
bp += 4;
n -= 3;
}
}
do_write();
}

/*
 * output a group of 3 bytes (4 input characters).
 * the input chars are pointed to by p, they are to
 * be output to file f.  n is used to tell us not to
 * output all of them at the end of the file.
 */
outdec(p, n)
char *p;
{
int c1, c2, c3;

c1 = DEC(*p) << 2 | DEC(p[1]) >> 4;
c2 = DEC(p[1]) << 4 | DEC(p[2]) >> 2;
c3 = DEC(p[2]) << 6 | DEC(p[3]);
if (n >= 1)
lputc(c1);
if (n >= 2)
lputc(c2);
if (n >= 3)
lputc(c3);
}

do_write()
{longbytect;
if (nextc != &outbuf[0])
{bytect = nextc - &outbuf[0];
if (Fwrite(outfile, bytect, outbuf) < 0)
{printf("Write error on output file\n");
exit(1);
}
nextc = outbuf;
}
}

jhs@MITRE-BEDFORD.ARPA.UUCP (01/30/87)

Here are some of the comments I have collected on uuencode/decode problems.
Many of them come from the atari16 news group, which is actively using
uuencoding/decoding.  I have ended the file with a "wish list" for what
a uuencode/decode program should ideally do.  Comments are solicited --
If somebody is going to write a really GOOD uuencode/decode package, it
might as well handle all known problems in the best way we can think of.

-John S.
------------------------------------------------------------------------------
From: Mike Vederman  <ACS19%UHUPVM1.BITNET@WISCVM.WISC.EDU>
Subject:      finally, i've figured uudecode ...
To: ST Users <INFO-ATARI16@SU-SCORE.ARPA>

After many, many problems with BITNET, IBM machines and uudecode, I have
found the true answer to having all of my files uudecode.  Every single file
which has bombed in the past, now decodes perfectly.

The normal sequence that I do is to send the file over from the IBM machine
(actually we have an AS/9000N) to the AT&T 3B20 and uudecode on the 3B20.
Previously, the only two files which worked were UNITERM and uEmacs.

Now, however, I have found the solution, but I am uncertain as to the exact
cause.  Here is what I do.  First, while in Xedit, I set the logical record
length to 61, then I set the record format to fixed.  The following commands
do this:

set lrecl 61
set recfm f

now, when I send the file to the 3B20, I enter the command:

sf file uue to acs19 at uhnix1

to receive on the 3B20, I enter the command:

receive $job/pnch6 >file.uue

(since the 3B20 is hooked up as a punch machine (yuck).  What you do may be
different, but I know setting the logical record length and the record format
have made all of my files work.)
When I get to the 3B20, I do have to fix the file a little bit.  I go to the
last line and delete the blanks to end of line, after 'end'.  Then I go up one
line and make sure that there is only 1 (one) blank on the second to the last
line.  Then I save the file and uudecode it.  All the files I have tried thus
far have worked, and most of these have always bombed.

Mike
------------------------------------------------------------------------------
From: <MHD@slacvm.bitnet>
Reply-To: MHD%SLACVM.BITNET@forsythe.stanford.edu

Like others, I had little success at first in decoding the uuencoded
files on this net.  After looking closely at the files I noticed that
the tilde character (~) was present when it should not have been.  Upon
further inspection I found that the carat character (^) was missing in
all files.  I tried a global change of tilde to carat and had no further
trouble

Perhaps many others have been screwed up in the same way by IBM machines
in the net.  Perhaps at other sites just one other character is screwed
up.  Give it a try, preferably do it first on a .ARC file where deARCing
will show errors on the CRC checks.

The only characters that should be in the uuencoded files are: space( )
through underline(_), ascii 20 through 5F.  All of these should be
present in a file of reasonable length.

Note:  The tilde and carat may get screwed up in this file.  They are
ascii 7E and 5E respectivly.

------------------------------------------------------------------------------
From: XBR1Y049%DDATHD21.BITNET@WISCVM.WISC.EDU  (Stephan Leicht c/o
  HRZ TH Darmstadt, Germany  )
Subject:  UUDECODE difficulties / Translation tables

I found my troubles in being not able to run some uuencoded programs
comming over the net.

The translation of the not-Char/circumflex(^)-Char is not unique.

Translating it to EBCDIC 5Fhex and sending it over some gateways,
it not always returns as EBCDIC 5Fhex, sometimes it returns
as EBCDIC 71hex. (I saw it sending it to ucbvax and return)
Now I have told our translation table to recognize 71hex as 5Fhex too.

Strange !

Maybe that this is a hint for those having no success in uudecoding.

      Stephan

        Name : Stephan Leicht
Organisation : Computer Center of Technical University Darmstadt, Germany
      Bitnet : XBR1Y049@DDATHD21
                                 insert all usual & unusual disclaimers here -->
------------------------------------------------------------------------------
From: mcvax!ukc!dcl-cs!bath63!pes@seismo.css.gov  (Paul Smee)
Organization: Bath University, England
Subject: Re: Lattice C and UUDECODE

I've found several problems with uudecode, which are caused by
the uuencoded file being munged by 'terminal handlers' enroute to me.
The symptoms sound like some I've seen complained about, specifically
that the file appears to decode, but bombs.  On looking at the files,
I found 2 different sorts of problem.

First, spurious 'control chars' get inserted in some cases.  In fact,
these appear to always be NULs (0x0) put in as 'delay padding' by
someones term handling software.

Second, trailing spaces get stripped.

It is fairly trivial to modify uudecode to ignore all control chars
(less that ASCII space) -- and harmless, as they should not be in
a uuencoded file.  Then, pad the line out to an arbitrarily large length
using spaces (I simply tack 64 spaces on the end, in my buffer, before
decoding a line).  Appending 64 spaces is crude and inelegant, I know, but
it insures that there will be enough in all circumstances, and is a much
simpler mod than actually looking to see how many are needed -- especially
since the line may contain as yet unprocessed control chars to be thrown away.

Since making these trivial changes I've had no problems with uudecoded files.

Hope this helps someone...
------------------------------------------------------------------------------
From: <RDROYA01%ULKYVX.BITNET@WISCVM.WISC.EDU> (Robert
  Royar)
Organization:   University of Louisville
Subject:        Another version of uue/decode offered (repost)

I have a version of uue/decode that I have been testing and that seems
to avoid some of the problems with the present setup.  I added some
code to the original programs to do some error checking.  The format
of the files the encoder uses is similar to the current version.  If
you have the current version, you can delete the extra information
with your word processor.  My version adds the following things:

        1. It breaks up long files so that each uuencoded file is no
        more than 300 lines long.

        2. It places and include directive at the end of each file at
        the break so the companion program can reassemble the files
        into a program.

        3. It adds a key table at the beginning of each file so that
        the decoder can index into this table to find the value for
        letters in the file.  I discovered that the errors I had with
        uudecode were often because a mailer had changed each instance
        of a character to something else.  This solves that problem.

        4. It checks the key table for integrity before it decodes the
        file.  This way if both 'H' and 'Z' have been changed to ' ',
        it will exit and tell you.

        5. If it finds a letter in the file that was not in the key
        table, it issues a warning telling what the character was,
        the filename, and the current file position.

        6. It appends a 'part' letter (a-z) at the end of each output
        line to avoid truncation problems.

This program makes uue/decoding somewhat simpler than it has been
because it breaks the files up into parts automatically and
reassembles them.  It actually is smaller and runs faster than the net
version of the program because I used freopen() to cut down on file
pointers and I used register variables where possible.  If you are
interested in this program, drop me a line and I will post it to you.

Robert Royar
rdroya01@ulkyvx.bitnet
<NOTE (jhs) - This program runs on the ST and would need work to port to
 the XL/XE/800 series.>
------------------------------------------------------------------------------
Summary of desirable uudecode features (J. Sangster 30 Jan 1986):

1.  Should check for characters outside the range $20 to $5F, which are
    not legal uuencode output and indicate a corrupted file.

2.  If found, $7E (~) should be changed to $5E (^).

3.  If found, $71     should be changed to $5F (acc. to Stefan Leicht)

4.  The decoder should be insensitive to loss of trailing blanks at the
    end of a line.  I.e. if the line is shorter than the indicated number
    of bytes (determined from the first character M or whatever), it should
    assume that any missing characters were blanks, $20, which will
    uudecode to nulls ($00) after subtracting $20.

5.  If any nulls ($00) are found in the uuencoded file, they should be
    deleted before processing.

6.  The encoder should never generate files longer than some maximum
    size.  One person suggested 300 lines.  I think 32K is acceptable
    to most mailers but room should be left for comments at the top.

7.  The decoder should reassemble the parts of a file automatically if
    it was broken into pieces.

8.  The decoder should ignore comments before the begin line and after
    the end line.

9.  The "translation table" idea -- inserting a list of all possible
    legal characters and checking what is received for uniqueness --
    is a terrific idea!

10. Since unix systems name the decoded output file from the info in the
    begin line, and also set protection codes from it, we Atarians should
    adopt some conventions related to these procedures.  I suggest:

	 (i) If the protect code is is less than 700, lock the output file.
	     (On a unix system, the first digit is owner rights and 7 is
	      "all rights" so this convention makes you think if the owner
	       is not supposed to have all rights.)

	 (ii) The decoder should display the filename embedded in the file
	      and ask if it is OK.  User can enter a <RETURN> to accept it
	      or a new filename if desired.

11. While we are at it, it would be nice to be able to handle sector encoding
    and reconstruction of whole disks.

That's all I can think of.  Anybody have any additional ideas?

-John S.