[alt.sources] "btoa Classic", "tarmailchunky", and comments on "new btoa"

per@parrot.Philips.Com (Paul E. Rutter) (06/14/89)

  This posting contains commentary, and shar source for the btoa-tarmail package.


It has come to my attention that a "new" version of btoa (binary to ASCII)
from Stefan Parmark d84sp@efd.lth.se is making the rounds.  As the original
author of btoa, I have a few comments about it:

1)  I hope people find the new version useful in their own work.  I CAN
imagine (non-unix) situations where somebody might benefit from the repair
features.  I am sure the author's intention and code are entirely well
meaning.  However...

2)  I object strenuously to the new version being called "btoa", as it
changes a lot of things, and although it can read and write the "old"
version, it defaults to writing the "new" way, which I AM SURE WILL CAUSE
UNNECESSARY CONFUSION FOR CURRENT USERS of btoa.  (I would not be posting
this "clarification" if a different name, say "mailcoder" had been used).

3)  Personally, I could not benefit from the "features" of the new version.
I will continue to distribute the original version, which is identical to
that distributed with "compress", and has been widely and successfully used
for years now.

-------------

I know the way the net works, and I know that this posting will lead to more
postings and more mail.  I intend to post this now, and respond no further to
anyone.  I have better things to do.  Each person who cares about this sort
of net minutiae will have to decide what they want to do for themselves
(there are lots of "mailcoders" out there).  Since the first and only time I
heard about the entirely rewritten "new" version (where I am still "nicely"
listed as first author) was from a friend forwarding the posting in
comp.sources.unix to me -- I will act similarly by (ab)using this news
group.  (In fairness, it is of course entirely possible that mail was sent to
me long ago and I never got it).


A little history:

I wrote btoa/atob as an alternative to uuencode.  While btoa is a bit more
efficient than uuencode, my real reason for writing it in the first place was
a dislike for uuencode/uudecode doing two things at once: it serves as a
coding/decoding filter, AND it insists on creating a file with a specific
name, owner, and mode.  This violation of the philosophy "do one thing" often
led to frustrations with "permission denied".  So, I specifically wrote
btoa/atob to be simple, optionless filters that only did encoding/decoding.
When I posted the source to the net years ago, I included two very simple
shell scripts: "tarmail" and "untarmail", that just piped tar to btoa to
mail.  Soon after, the authors of the excellent "compress" program wanted to
bundle btoa in their distribution, and the pipeline in the tarmail script
became:

  tar cvf - $* | compress | btoa | mail

After one early patch to get around a "feature" of bitnet (bitnet did weird
things to blank lines), the only problem since has been occasional claims of
bugs that have always turned out to be caused by other people "improving" the
program and passing it on.  Mr. Parmark says in his readme:

> Btoa is in the public domain. You may use it, give it away, and
> make improvements, as long as the names of the developers are
> mentioned and you don't use it to earn money. It may NOT be used
> commercially without my permission.

As the original author I hereby give permission to use my stuff in anyway you
want, even commercially, but PLEASE, issue any improvement or change under
other program names, and without my name.


About the "new" version, I note in passing from the "new" man page:

> KNOWN BUGS
>   Btoa will not work properly unless the input is a true  file
>   or a redirected one. This is because file positions are col-
>   lected during diagnosis for later reference  when  producing
>   the  diagnosis  file.   The bug is actually in fseek() which
>   only can reposition 'real' files.

I do not consider this to be a "bug in fseek" (the return value of which is
not checked in the "new" source code for an error return).  fseek will never
be able to seek arbitrarily on a true pipe, and since shells like "tarmail"
do use pipes -- good luck.

There is also the comment that:

> I removed the feature to exit with no output if there was an error in the
> archive. ...  I hope all realize that you shouldn't run a file that was
> created from a corrupted archive.

Well, the reason I did it that way is that most people using a script like
tarmail DO NOT understand that sort of thing in the least.

If I was going to write another binary-to-mail encoder (and I am not!), I
would probably do a few things different.  But I sure would not call the new
one "btoa".  (Actually, I wish people would spend their time working on X.400
mail, so btoa and uuencode would become obsolete...)

--------------------- (clarification off) ----------------------------------

Even though this is comp.sources.d, my original source to btoa is short
enough that I will violate net protocol and put it here now (rsalz is welcome
to put this part in comp.sources.unix if he sees fit).  (Indeed, if one
removes whitespace, the source to the decoding program "atob" is short enough
that some people have scripts that tack it on to the front of their tarmail,
enabling "self-decoding".)

As a bonus for those who have waded this far, there is a new "tarmailchunky"
script (from Mark Baushke, thanks) that will help when sending large tars
though machines with low mail size limits.  I did not want to build this
chunky feature into the original "tarmail" script, as I do not want to
further encourage brain damaged 64K limits; for example, between machines
equipped with 19200 baud modems, it is a costly mistake to arbitrarily break
mail into a whole bunch of < 1 minute phone calls.  There is an
"untarmailchunky" script that puts the pieces back together without you
having to strip off headers (however it is up to you to feed it all the
pieces in the right order).  Fancier scripts for chunking can and have been
written -- be my guest.

If you are already using "bota Classic":  Nothing other than the "chunky"
scripts (and related changes to the man page), has been added in the
following shar package.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

# The rest of this file is a shell script which will extract:
# Makefile btoa.c atob.c tarmail untarmail tarmailchunky untarmailchunky btoa.man
# Suggested restore procedure:
# Edit off anything above these comment lines,
# save this file in an empty directory,
# then say: sh < file
echo x - Makefile
cat >Makefile <<'!Funky!Stuff!'
# makefile for  btoa/atob/tarmail
# Paul E. Rutter
# per@philabs.philips.com
#
# do whatever you want with these programs, but please do not make any
# changes and distribute "new" versions under the same program names.
#
# You need to make BINDIR, SHELLDIR, and MANDIR correct for your situation

BINDIR   = /usr/local/bin
SHELLDIR = /usr/local/bin
MANDIR   = /usr/man/manl

CC       = cc -O

BINS   = btoa atob
SHELLS = tarmail untarmail tarmailchunky untarmailchunky
MANS   = btoa.man


install:	clean man all
		strip $(BINS)
		chmod 755 $(BINS) $(SHELLS)
		cp -p $(BINS)   $(BINDIR)
		cp -p $(SHELLS) $(SHELLDIR)
		make clean

all:		$(BINS) $(SHELLS)

man:		$(MANS)
		chmod 644 $(MANS)
		cp -p btoa.man $(MANDIR)/btoa.l
		cp -p btoa.man $(MANDIR)/tarmail.l

btoa:		btoa.c
		$(CC) -o btoa btoa.c

atob:		atob.c
		$(CC) -o atob atob.c


shar:		Makefile btoa.c atob.c $(SHELLS) $(MANS)
		shar btoa.shar Makefile btoa.c atob.c $(SHELLS) $(MANS)

clean:		
		rm -f $(BINS) btoa.shar
!Funky!Stuff!
echo x - btoa.c
cat >btoa.c <<'!Funky!Stuff!'
/* btoa: version 4.0
 * stream filter to change 8 bit bytes into printable ascii
 * computes the number of bytes, and three kinds of simple checksums
 * incoming bytes are collected into 32-bit words, then printed in base 85
 * exp(85,5) > exp(2,32)
 * the ASCII characters used are between '!' and 'u'
 * 'z' encodes 32-bit zero; 'x' is used to mark the end of encoded data.
 *
 * do whatever you want with these programs, but PLEASE do not make any
 * changes and distribute "new" versions under the same program names.
 *
 * Paul Rutter Joe Orost
 */

#include <stdio.h>

#define reg register

#define MAXPERLINE 78

long int Ceor = 0;
long int Csum = 0;
long int Crot = 0;

long int ccount = 0;
long int bcount = 0;
long int word;

#define EN(c)	(int) ((c) + '!')

encode(c) reg c;
{
  Ceor ^= c;
  Csum += c;
  Csum += 1;
  if ((Crot & 0x80000000)) {
    Crot <<= 1;
    Crot += 1;
  } 
  else {
    Crot <<= 1;
  }
  Crot += c;

  word <<= 8;
  word |= c;
  if (bcount == 3) {
    wordout(word);
    bcount = 0;
  } 
  else {
    bcount += 1;
  }
}

wordout(word) reg long int word;
{
  if (word == 0) {
    charout('z');
  } 
  else {
    reg int tmp = 0;

    if (word < 0)
    { /* Because some don't support unsigned long */
      tmp = 32;
      word = word - (long)(85L * 85 * 85 * 85 * 32);
    }
    if (word < 0) {
      tmp = 64;
      word = word - (long)(85L * 85 * 85 * 85 * 32);
    }
    charout(EN((word / (long)(85L * 85 * 85 * 85)) + tmp));
    word %= (long)(85L * 85 * 85 * 85);
    charout(EN(word / (85L * 85 * 85)));
    word %= (85L * 85 * 85);
    charout(EN(word / (85L * 85)));
    word %= (85L * 85);
    charout(EN(word / 85));
    word %= 85;
    charout(EN(word));
  }
}

charout(c) {
  putchar(c);
  ccount += 1;
  if (ccount == MAXPERLINE) {
    putchar('\n');
    ccount = 0;
  }
}

main(argc,argv)
char **argv;
{
  reg c;
  reg long int n;

  if (argc != 1) {
    fprintf(stderr,"bad args to %s\n", argv[0]);
    exit(2);
  }
  printf("xbtoa Begin\n");
  n = 0;
  while ((c = getchar()) != EOF) {
    encode(c);
    n += 1;
  }
  while (bcount != 0) {
    encode(0);
  }
  /* n is written twice as crude cross check*/
  if (ccount == 0) /* ccount == 0 means '\n' just written in charout() */
    ; /* this avoids bug in BITNET, which changes blank line to spaces */
  else
    putchar('\n');
  printf("xbtoa End N %ld %lx E %lx S %lx R %lx\n", n, n, Ceor, Csum, Crot);
  exit(0);
}
!Funky!Stuff!
echo x - atob.c
cat >atob.c <<'!Funky!Stuff!'
/* atob
 * stream filter to change printable ascii from "btoa" back into 8 bit bytes
 * if bad chars, or Csums do not match: exit(1) [and NO output]
 *
 * do whatever you want with these programs, but PLEASE do not make any
 * changes and distribute "new" versions under the same program names.
 *
 * Paul Rutter Joe Orost
 */

#include <stdio.h>

#define reg register

#define streq(s0, s1)  strcmp(s0, s1) == 0

#define times85(x)	((((((x<<2)+x)<<2)+x)<<2)+x)

long int Ceor = 0;
long int Csum = 0;
long int Crot = 0;
long int word = 0;
long int bcount = 0;

fatal() {
  fprintf(stderr, "bad format or Csum to atob\n");
  exit(1);
}

#define DE(c) ((c) - '!')

decode(c) reg c;
{
  if (c == 'z') {
    if (bcount != 0) {
      fatal();
    } 
    else {
      byteout(0);
      byteout(0);
      byteout(0);
      byteout(0);
    }
  } 
  else if ((c >= '!') && (c < ('!' + 85))) {
    if (bcount == 0) {
      word = DE(c);
      ++bcount;
    } 
    else if (bcount < 4) {
      word = times85(word);
      word += DE(c);
      ++bcount;
    } 
    else {
      word = times85(word) + DE(c);
      byteout((int)((word >> 24) & 255));
      byteout((int)((word >> 16) & 255));
      byteout((int)((word >> 8) & 255));
      byteout((int)(word & 255));
      word = 0;
      bcount = 0;
    }
  } 
  else {
    fatal();
  }
}

FILE *tmp_file;

byteout(c) reg c;
{
  Ceor ^= c;
  Csum += c;
  Csum += 1;
  if ((Crot & 0x80000000)) {
    Crot <<= 1;
    Crot += 1;
  } 
  else {
    Crot <<= 1;
  }
  Crot += c;
  putc(c, tmp_file);
}

main(argc, argv) char **argv;
{
  reg c;
  reg long int i;
  char tmp_name[100];
  char buf[100];
  long int n1, n2, oeor, osum, orot;

  if (argc != 1) {
    fprintf(stderr,"bad args to %s\n", argv[0]);
    exit(2);
  }
  sprintf(tmp_name, "/usr/tmp/atob.%x", getpid());
  tmp_file = fopen(tmp_name, "w+");
  if (tmp_file == NULL) {
    fatal();
  }
  /* Make file disappear */
  if (unlink(tmp_name) == -1) {
    fatal();
  }
  /*search for header line*/
  for (;;) {
    if (fgets(buf, sizeof buf, stdin) == NULL) {
      fatal();
    }
    if (streq(buf, "xbtoa Begin\n")) {
      break;
    }
  }

  while ((c = getchar()) != EOF) {
    if (c == '\n') {
      continue;
    } 
    else if (c == 'x') {
      break;
    } 
    else {
      decode(c);
    }
  }
  if (scanf("btoa End N %ld %lx E %lx S %lx R %lx\n", &n1, &n2, &oeor, &osum, &orot) != 5) {
    fatal();
  }
  if ((n1 != n2) || (oeor != Ceor) || (osum != Csum) || (orot != Crot)) {
    fatal();
  } 
  else {
    /* Now that we know everything is OK, copy tmp file to stdout */
    if (fseek(tmp_file, 0L, 0) == -1) {
      fatal();
    }
    for (i = n1; --i >= 0;) {
      putchar(getc(tmp_file));
    }
  }
  exit(0);
}
!Funky!Stuff!
echo x - tarmail
cat >tarmail <<'!Funky!Stuff!'
#!/bin/sh
if test $# -lt 3; then
  echo "Usage:  tarmail mailpath \"subject-string\" directory-or-file(s)"
  exit
else
  mailpath=$1
  echo "mailpath = $mailpath"
  shift
  subject="$1"
  echo "subject-string = $subject"
  shift
  echo files = $*
  tar cvf - $* | compress | btoa | Mail -s "$subject" $mailpath
fi
!Funky!Stuff!
echo x - untarmail
cat >untarmail <<'!Funky!Stuff!'
#!/bin/sh
if test $# -ge 1; then
   atob < $1 | uncompress | tar xvpf -
   mv $1 /tmp/$1.$$
   echo tarmail file moved to: /tmp/$1.$$
else
   atob | uncompress | tar xvpf -
fi
!Funky!Stuff!
echo x - tarmailchunky
cat >tarmailchunky <<'!Funky!Stuff!'
#!/bin/sh
# "tarmailchunky" takes a file or list of files and creates a "tar file" it
# then compresses this data (using compress) and converts it to an ascii
# form (using btoa). If it is "too large" to fit into typical mail
# transport systems (some uucp sites break at 64K bytes), it will split
# the image into multiple parts and send them using the standard "mail"
# command.
if test $# -lt 3; then
  echo "Usage:  tarmailchunky mailpath \"subject-string\" directory-or-file(s)"
  echo
  echo "tarmailchunky is a shell script that uses tar, compress, btoa, and split"
  echo "to send arbitrary hierarchies by mail.  It sends things as one or"
  echo "more < 64K pieces.  (see shell script to change this size)."
  exit
else
  mailpath=$1
  echo "mailpath = $mailpath"
  shift
  subject="$1"
  echo "subject-string = $subject"
  shift
  echo files = $*
  tar cvf - $* | compress | btoa | split -750 - /tmp/tm$$
  n=1
  set /tmp/tm$$*
  for f do
    {
	echo '---start beef'
	cat $f
	echo '---end beef'
    } | Mail -s "$subject - part $n of $#" $mailpath
    echo "part $n of $# sent (" `wc -c < $f` "bytes)"
    n=`expr $n + 1`
  done
  rm /tmp/tm$$*
fi
!Funky!Stuff!
echo x - untarmailchunky
cat >untarmailchunky <<'!Funky!Stuff!'
#!/bin/sh
# "untarmailchunky" takes a an ordered list of mail messages (if they were in
# multiple parts, the must be fed to untarmail in order) and recreates
# the data stored by the original "tarmail" reversing each step along
# the way.
if test $# -ge 1; then
   sed '/^---end beef/,/^---start beef/d' $* | atob | uncompress | tar xvpf -
   echo remember to remove the tarmail files: $*
else
   sed '/^---end beef/,/^---start beef/d' | atob | uncompress | tar xvpf -
fi
!Funky!Stuff!
echo x - btoa.man
cat >btoa.man <<'!Funky!Stuff!'
.TH BTOA 1 local
.SH NAME
btoa, atob, tarmail, untarmail, tarmailchunky, untarmailchunky \- encode/decode binary to printable ASCII
.SH SYNOPSIS
.B btoa < anything > ASCII
.br
.B atob < btoafile > anything
.br
.B tarmail
subject-string who files ...
.br
.B untarmail
[ file ]
.br
.B tarmailchunky
subject-string who files ...
.br
.B untarmailchunky
[ file ]
.SH DESCRIPTION
.I btoa
is a filter that reads anything from the standard input, and encodes it into
printable ASCII on the standard output.  It also attaches a header and checksum
information used by the reverse filter 
.I atob 
to find the start of the data and to check integrity.
.PP
.I atob
reads an encoded file, strips off any leading and trailing lines added by
mailers, and recreates a copy of the original file on the standard output.
.I atob
gives NO output (and exits with an error message) if its input is garbage or
the checksums do not check.  (The checksum is at the end; giving no output on
checksum error guarantees that no "partial things" will be created by pipe
scripts like untarmail if there was an error in transit).
.PP
.I tarmail
is a shell script that tar's up all the given files, pipes them 
through 
.IR compress ","
.IR btoa ","
and mails them to the given person.  For
example:
.PP
.in 1i
tarmail ralph "two files for you"  foo.c a.out
.in -1i
.PP
Will package up files "foo.c" and "a.out" and mail them to "ralph", with a mail
subject line of "two files for you".
.PP
.I tarmail
with no arguments will print a short message reminding you what the required
args are.  When the mail is received at the other end, that person should use
mail to save the message in some temporary file name (say "xx").  Then,
executing
.PP
.in 1i
untarmail xx
.in -1i
.PP
will decode the message and untar it.  (In general, you will want to be in an
empty directory, or the "right" directory when you execute this, since the
"untar" will be creating new files).
.I untarmail
can also be used as a filter.  By using
.IR tarmail ","
binary files and entire directory structures can be easily transmitted
between machines.  Naturally, you should understand what tar itself does
before you use
.IR tarmail "."
.PP
.I tarmailchunky
is a shell script similar to tarmail, but it uses split to break the message
into one or more pieces, each less than 64 Kbytes long.  Use it when faced
with mail size limits.  On the receiving end, save the pieces as "xx.01",
"xx.02", ... as they come in (they are numbered for you in the subject line
by tarmailchunky).  Then, use
.PP
.in 1i
untarmailchunky xx.??
.in -1i
.PP
to decode the message and untar it.  untarmailchunky uses sed to strip off
mail headers and trailers on the pieces, so you do not have to do that
manually.  You DO have to give it the files in numerical order.
.PP
Other uses for btoa:
.PP
compress < secrets | crypt | btoa | mail ralph
.PP
will mail the encrypted contents of the file "secrets" to ralph.  If ralph
knows the encryption key, he can decode it by saving the mail (say in "xx"),
and then running:
.PP
atob < xx | crypt | uncompress
.PP
(crypt requests the key from the terminal,
and the "secrets" come out on the terminal).
.SH AUTHOR
Paul Rutter (with thanks to Joe Orost and Mark Baushke)
.SH FEATURES
.I btoa
uses a compact base-85 encoding so that
4 bytes are encoded into 5 characters (file is expanded by 25%).
As a special case, 32-bit zero is encoded as one character.  This encoding
produces less output than
.IR uuencode "(1)."
.SH NOTE
The source for btoa is freely available.  Use it any way you want, but please
do not distribute changed versions under these program names.
.SH "SEE ALSO"
compress(1), crypt(1), uuencode(1), mail(1), split(1), sed(1)
!Funky!Stuff!
Paul Rutter    Philips Labs     per@philabs.philips.com    uunet!philabs!per