[comp.binaries.ibm.pc.d] Freely-distributable uudecode for Unix

w8sdz@WSMR-SIMTEL20.ARMY.MIL (Keith Petersen) (12/16/89)

Some readers of this list have reported problems uudecoding files sent
by LISTSERV (or TRICKLE in Europe) which contain a "M" at the end of
each line and on the final line just before the "end" statement.

The first character in each line defines the number of bytes of data
which follow on that line.  If the uudecoder is working correctly the
trailing "M" should be ignored.  It is added by LISTSERV to get around
problems of some mailers dropping trailing blanks.

Here is a freely distributable uudecode for Unix that does the "right
thing", courtesy of UC Berkeley.

Keith
--
Keith Petersen
Maintainer of SIMTEL20's CP/M, MSDOS, & MISC archives [IP address 26.2.0.74]
Internet: w8sdz@WSMR-SIMTEL20.Army.Mil, w8sdz@brl.arpa  BITNET: w8sdz@NDSUVM1
Uucp: {ames,decwrl,harvard,rutgers,ucbvax,uunet}!wsmr-simtel20.army.mil!w8sdz

---cut-here---
/*
 * Copyright (c) 1983 Regents of the University of California.
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms are permitted
 * provided that the above copyright notice and this paragraph are
 * duplicated in all such forms and that any documentation,
 * advertising materials, and other materials related to such
 * distribution and use acknowledge that the software was developed
 * by the University of California, Berkeley.  The name of the
 * University may not be used to endorse or promote products derived
 * from this software without specific prior written permission.
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
 * WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
 */

#ifndef lint
static char sccsid[] = "@(#)uudecode.c	5.5 (Berkeley) 7/6/88";
#endif /* not lint */

/*
 * uudecode [input]
 *
 * create the specified file, decoding as you go.
 * used with uuencode.
 */
#include <stdio.h>
#include <pwd.h>
#include <sys/types.h>
#include <sys/stat.h>

/* single character decode */
#define DEC(c)	(((c) - ' ') & 077)

main(argc, argv)
char **argv;
{
	FILE *in, *out;
	int mode;
	char dest[128];
	char buf[80];

	/* optional input arg */
	if (argc > 1) {
		if ((in = fopen(argv[1], "r")) == NULL) {
			perror(argv[1]);
			exit(1);
		}
		argv++; argc--;
	} else
		in = stdin;

	if (argc != 1) {
		printf("Usage: uudecode [infile]\n");
		exit(2);
	}

	/* search for header line */
	for (;;) {
		if (fgets(buf, sizeof buf, in) == NULL) {
			fprintf(stderr, "No begin line\n");
			exit(3);
		}
		if (strncmp(buf, "begin ", 6) == 0)
			break;
	}
	(void)sscanf(buf, "begin %o %s", &mode, dest);

	/* handle ~user/file format */
	if (dest[0] == '~') {
		char *sl;
		struct passwd *getpwnam();
		struct passwd *user;
		char dnbuf[100], *index(), *strcat(), *strcpy();

		sl = index(dest, '/');
		if (sl == NULL) {
			fprintf(stderr, "Illegal ~user\n");
			exit(3);
		}
		*sl++ = 0;
		user = getpwnam(dest+1);
		if (user == NULL) {
			fprintf(stderr, "No such user as %s\n", dest);
			exit(4);
		}
		strcpy(dnbuf, user->pw_dir);
		strcat(dnbuf, "/");
		strcat(dnbuf, sl);
		strcpy(dest, dnbuf);
	}

	/* create output file */
	out = fopen(dest, "w");
	if (out == NULL) {
		perror(dest);
		exit(4);
	}
	chmod(dest, mode);

	decode(in, out);

	if (fgets(buf, sizeof buf, in) == NULL || strcmp(buf, "end\n")) {
		fprintf(stderr, "No end line\n");
		exit(5);
	}
	exit(0);
}

/*
 * copy from in to out, decoding as you go along.
 */
decode(in, out)
FILE *in;
FILE *out;
{
	char buf[80];
	char *bp;
	int n;

	for (;;) {
		/* for each input line */
		if (fgets(buf, sizeof buf, in) == NULL) {
			printf("Short file\n");
			exit(10);
		}
		n = DEC(buf[0]);
		if (n <= 0)
			break;

		bp = &buf[1];
		while (n > 0) {
			outdec(bp, out, n);
			bp += 4;
			n -= 3;
		}
	}
}

/*
 * output a group of 3 bytes (4 input characters).
 * the input chars are pointed to by p, they are to
 * be output to file f.  n is used to tell us not to
 * output all of them at the end of the file.
 */
outdec(p, f, n)
char *p;
FILE *f;
{
	int c1, c2, c3;

	c1 = DEC(*p) << 2 | DEC(p[1]) >> 4;
	c2 = DEC(p[1]) << 4 | DEC(p[2]) >> 2;
	c3 = DEC(p[2]) << 6 | DEC(p[3]);
	if (n >= 1)
		putc(c1, f);
	if (n >= 2)
		putc(c2, f);
	if (n >= 3)
		putc(c3, f);
}

kim@uts.amdahl.com (Kim DeVaughn) (12/16/89)

In article <KPETERSEN.12550482989.BABYL@WSMR-SIMTEL20.ARMY.MIL>, w8sdz@WSMR-SIMTEL20.ARMY.MIL (Keith Petersen) writes:
> Some readers of this list have reported problems uudecoding files sent
> by LISTSERV (or TRICKLE in Europe) which contain a "M" at the end of
> each line and on the final line just before the "end" statement.
> 
> The first character in each line defines the number of bytes of data
> which follow on that line.  If the uudecoder is working correctly the
> trailing "M" should be ignored.  It is added by LISTSERV to get around
> problems of some mailers dropping trailing blanks.

Several of the newer flavors of the uutwins (uuencode/uudecode) add a couple
of characters beyond the specified line length (which is usually "M") to 
provide line-by-line checksums.  If these chars aren't present, the uudecode
just assumes there .uue was created by an older uuencode, and doesn't do any
checksumming.

If there are characters beyond the encoded line length, they do assume these
char(s) represent a checksum, which is what's happening in this case.  Some
also do an overall length check at the EOF.

The problem with trailing blanks is also eliminated, as they use a non-blank
char in place of a blank (a ` I believe).  So there are NO blanks anywhere
in the .uue.  This non-blank char maps to the same decode value, so this 
scheme is backwardly compatible with older versions of the uutwins, and doesn't
break anything (or at least not any of the many flavors I've ever come across).

Would you be interested in converting to the newer versions, as they do provide
some additional error checking?  If so, I'll be happy to email them to you or
whomever the "right" person is at LISTSERV.

/kim
-- 
UUCP:  kim@amdahl.amdahl.com
  or:  {sun,decwrl,hplabs,pyramid,uunet,oliveb,ames}!amdahl!kim
DDD:   408-746-8462
USPS:  Amdahl Corp.  M/S 249,  1250 E. Arques Av,  Sunnyvale, CA 94086
BIX:   kdevaughn     GEnie:   K.DEVAUGHN     CIS:   76535,25

usenet@cps3xx.UUCP (Usenet file owner) (12/17/89)

In article <f9ug02QP74xw01@amdahl.uts.amdahl.com> kim@uts.amdahl.com (Kim DeVaughn) writes:
%Several of the newer flavors of the uutwins (uuencode/uudecode) add a couple
%of characters beyond the specified line length (which is usually "M") to 
%provide line-by-line checksums.  If these chars aren't present, the uudecode
%just assumes there .uue was created by an older uuencode, and doesn't do any
%checksumming.
%
%If there are characters beyond the encoded line length, they do assume these
%char(s) represent a checksum, which is what's happening in this case.  Some
%also do an overall length check at the EOF.
%
%The problem with trailing blanks is also eliminated, as they use a non-blank
%char in place of a blank (a ` I believe).  So there are NO blanks anywhere
%in the .uue.  This non-blank char maps to the same decode value, so this 
%scheme is backwardly compatible with older versions of the uutwins, and doesn't
%break anything (or at least not any of the many flavors I've ever come across).
%
%Would you be interested in converting to the newer versions, as they do provide
%some additional error checking?  If so, I'll be happy to email them to you or
%whomever the "right" person is at LISTSERV.

Yes, I am very interested.  I looked around on several machines, and
none of them has the newer better flavours that do checksumming.  Could
you please post the source here?  Thank you.

%/kim
%-- 
%UUCP:  kim@amdahl.amdahl.com
%  or:  {sun,decwrl,hplabs,pyramid,uunet,oliveb,ames}!amdahl!kim

Neither of these addresses work from Mich State Univ or Univ of So Calif.

In the rare case that original ideas   Kenneth J. Hendrickson    N8DGN
are found here, I am responsible.      Owen W328, E. Lansing, MI 48825
Internet: kjh@usc.edu                  UUCP: ...!uunet!usc!pollux!kjh

kevin@kosman.UUCP (Kevin O'Gorman) (12/18/89)

In article <KPETERSEN.12550482989.BABYL@WSMR-SIMTEL20.ARMY.MIL> w8sdz@WSMR-SIMTEL20.ARMY.MIL (Keith Petersen) writes:
>Some readers of this list have reported problems uudecoding files sent
>by LISTSERV (or TRICKLE in Europe) which contain a "M" at the end of
>each line and on the final line just before the "end" statement.
>
>The first character in each line defines the number of bytes of data
>which follow on that line.  If the uudecoder is working correctly the
>trailing "M" should be ignored.  It is added by LISTSERV to get around
>problems of some mailers dropping trailing blanks.

Some clarification is necessary, because this doesn't sound like the whole
story.  The UUDECODE and UUENCODE that I have on my clone works very
well with the one on my UNIX machine.  Neither of them is very happy
with the files from SIMTEL20.

The ones on MSDOS have a bunch of help screens that include this additional
information:

After explaining the basic encoding technique, they comment that since
some mailers attempt to munge space characters in one way or another, all
spaces are converted to back-tick (`) characters, which are not otherwise
used in the encoding.  The SIMTEL encoded does not seem to do this, and
instead ends every line with an "M", which at least avoids problems with
mailers that strip trailing blanks.

They also comment that the last character of each line is a checksum, and
explain a bit about how that is computed.  The checksum falls in the same
position used by the trailing "M" in the SIMTEL encodings, thus the decoders
that I use think there's a checksum error on nearly every line.  The one
on MSDOS at least notices that this is happening very quickly and asks if
I want to just ignore the checksum problems.

I only have these two decoders to go by.  I have no idea what's common in
the BSD world, I can only guess how much like SYSV my software set is,
so I'm not going to claim that this is the best and most modern stuff there
is, but it sure sounds like a more robust scheme that what's being used
at SIMTEL20.  If it is also pretty standard, I would hope that SIMTEL20
would begin to use this scheme.

w8sdz@smoke.BRL.MIL (Keith Petersen) (12/19/89)

There has been some discussion about "the uuencode method that SIMTEL20
uses."  SIMTEL20 does not uuencode files.  LISTSERV uuencodes files.

--> SIMTEL20 does NOT run the LISTSERV and TRICKLE netmail servers! 

Please complain to the administrators of the LISTSERV or TRICKLE you are
using.

For VM1.NODAK.EDU:  Info@VM1.NODAK.EDU
For VM.ECS.RPI.EDU: FISHER@VM.ECS.RPI.EDU

If you are on BITNET:

For NDSUVM1:  Info@NDSUVM1
For RPIECS:   FISHER@RPIECS

This won't get changed unless you complain to the right people!

Keith
-- 
Keith Petersen
Maintainer of SIMTEL20's CP/M, MSDOS, & MISC archives [IP address 26.2.0.74]
Internet: w8sdz@WSMR-SIMTEL20.Army.Mil, w8sdz@brl.arpa  BITNET: w8sdz@NDSUVM1
Uucp: {ames,decwrl,harvard,rutgers,ucbvax,uunet}!wsmr-simtel20.army.mil!w8sdz

saj@chinet.chi.il.us (Stephen Jacobs) (12/19/89)

Keith Petersen suggested that people take up TRICKLE-related problems with the
administrators of the machines running TRICKLE.  As the victim of some
really massive screw-ups originating in a nearby BITNET-internet gateway, I'd
add that you may want to check the path things reach you by, and keep the
postmasters of intermediate machines informed of things that may be happening
to mail at their sites.  EBCDIC-ASCII translation still messes things up now
and then.                                 Steve J.

bob@atom.OZ (Bob Backstrom) (12/22/89)

One thing to watch out for with the C version of uudecode posted
a little while back is how you open the output file under Turbo-C.

Change the "w" to "wb" to open in BINARY mode.  Otherwise, you'll
have the great fun of every 0ah -> 0dh, 0ah making executables run
for about a millisecond before jumping off to never-never land.

Cheers.

-- 
* ACSNET: bob@atom.oz    Bob Backstrom,
* Phone:  (02) 543-3092  Australian Nuclear Science & Technology Organisation,
*                        Private Mailbag 1, Menai,
*                        New South Wales, Australia, 2234.