[comp.sources.d] An idea for safer and portable unshar-ing

john@chance.UUCP (John R. MacMillan) (10/01/89)

In order to make it easier for unshar programs to work without
using /bin/sh, perhaps we should agree (hah!) upon some keyword
directives that shar programs would include as comments. Eg.

# FILE filename
# NOEXIST
# DATA prefix end_delimiter
# SIZE [l LINES] [w WORDS] [c CHARS]
# CONTINUE filename
# EXIST
# DECODE how
# SUBDIR directory

This is just a first shot, but you get the idea.  The unpacker
could be as paranoid as it wants.  If the shar wants to get
trickier than the keywords allow, it could, and the unpacker
would just not do the tricky parts (perhaps a SKIP or WARN
directive).  But notice that the above could handle a uuencoded
compressed file split across two parts that ends up in a
subdirectory.

The tough part would be getting people to make their shar
programs generate it.
-- 
John R. MacMillan           "Don't you miss it...don't you miss it...
john@chance.UUCP             Some of you people just about missed it."
...!utcsri!hcr!chance!john        -- Talking Heads

djm@wam.UMD.EDU (10/02/89)

In article <1989Sep30.171114.12550@chance.UUCP> john@chance.UUCP (John R. MacMillan) writes:
>In order to make it easier for unshar programs to work without
>using /bin/sh, perhaps we should agree (hah!) upon some keyword
>directives that shar programs would include as comments. Eg.

This suggestion seems to be moving in the direction of making archives
that plain old /bin/sh can't unpack at all.  Perhaps it's not a bad
idea.  An easier to parse, more standardized pure-ASCII archiving
format than a shell archive would certainly be more appropriate for
Amiga, MS-DOS, VMS, etc. postings, and would allow the packing and
unpacking programs more versatility, security and control on Unix
systems as well.

Right now there is a profusion of shar programs that generate all kinds
of codes to split up the included files, using sed, cat, wc, etc. and
starting some or all lines with 'X' or '|' or '\tx' or who knows what
else; secure unshar programs written in C have to simulate that, and as is
becoming clear in this discussion, interpreting all of those formats
requires implementing a substantial subset of the /bin/sh syntax -- a
task which is much more difficult than required by the task of unpacking
an ASCII archive.  Of course, only one unshar program really need exist,
as long as it is comprehensive and portable.  Perhaps Rich Saltz's new
release will satisfy everyone.

I would like to see a replacement for shell archives that would have a
simple to parse format similar to the one John suggested.  It would have
a header section for the whole archive, giving information like:

# PARTS total number of parts
# PART number of the this part
# CREATED date of creation (ctime format would do, I guess)
# CONTAINS names of the files it contains

There would be another header section for each file extracted, with
information like:

# FILE file name
	or
# DIRECTORY directory name
# OFFSET starting offset of this part, to allow continuation of long files
# BYTES file length
# CHECKSUM checksum for original file
# MODIFIED last modification date of file
# ENCODING encoding method: ASCII, atob, others?

Comments could have the same format as shell comments.
The ASCII encoding could be accomplished by adding an extra '#' at the
start of all lines in enclosed files that start with a '#', and then
changing an initial '##' back to '#' when unpacking.

I haven't decided whether this format should require the presence of
external programs to do part of the work, like atob, compress, and sum.

>The tough part would be getting people to make their shar
>programs generate it.

I think the harder part would be getting the programs that generate the
suggested format into the hands of everyone who wants to distribute
source code, and the programs that decode it into the hands of everyone
who wants to use programs distributed in that format.  In addition to
tar, cpio, uu*code, [ab]to[ba], compress, arc, zip, and perhaps unshar,
people would need to have another archiver/unarchiver.  Tower of Babel!
-- 
David J. MacKenzie <djm@wam.umd.edu>

ok@cs.mu.oz.au (Richard O'Keefe) (10/02/89)

In article <8910020054.AA08811@cscwam.UMD.EDU>, djm@wam.UMD.EDU writes:
> This suggestion seems to be moving in the direction of making archives
> that plain old /bin/sh can't unpack at all.  Perhaps it's not a bad
> idea.  An easier to parse, more standardized pure-ASCII archiving
> format than a shell archive would certainly be more appropriate for
> Amiga, MS-DOS, VMS, etc. postings, and would allow the packing and
> unpacking programs more versatility, security and control on Unix
> systems as well.

Let's not forget why we use sharchives in the first place.
The point was to have a format for distributing sources which could
be used by people who HAVEN'T got any specialised "unshar".  If I am
away from the net for a couple of months and find when I get back that
all the sources are in some new format that I can't process, I am not
going to be very happy.  And saying that something is held on an
archive somewhere is not very helpful either; lots of people have no
FTP access.

It would be ok to go over to a new format IF each of the source groups
that used it posted a fresh copy of the decoding program every month,
along with the index for the previous month.

MS-DOS people can get a shell for a small sum.  VMS people can get
DEC/Shell; and if they haven't got it, you should remember that a posting
in C is useless to many VMS sites anyway.

On the other hand, if you're interested in "more standardised" stuff,
don't forget that ASCII is (a) a *national* standard, not an international
one, and (b) superceded by the ISO 8859 family, and (c) a pain for BITNET
mail links.  Your new format should let an MS-DOS-using donor mail text
containing e-acute and other such characters to a MAC-using recipient
with no harm resulting from an intermediate passage through EBCDIC.  Get
_that_ right first, and then worry about shar.

lhf@aries5.uucp (Luiz H de Figueiredo) (10/02/89)

How about starting with arc from Software Tools?
I think there are a number of implementations already available.
It might need some modification to allow for automatic decompression, but
should serve fine as a start.

-------------------------------------------------------------------------------
Luiz Henrique de Figueiredo		internet: lhf@aries5.uwaterloo.ca
Computer Systems Group			bitnet:   lhf@watcsg.bitnet
University of Waterloo
-------------------------------------------------------------------------------

karl@ficc.uu.net (Karl Lehenbauer) (10/02/89)

I think a portable archiver that writes headers in character format is the
way to go as a long-term replacement for shar, something along the line
of "cpio -oc"   

It is still very desirable to be able to examine the contents of an archive
without having to uudecode, unzoo, etc.  This argues for a textual archive,
continuing current practice.
-- 
-- uunet!ficc!karl	"The last thing one knows in constructing a work 
			 is what to put first."  -- Pascal

dhesi@sun505.UUCP (Rahul Dhesi) (10/03/89)

The development of my "rap" archiver was suspended due to my move.
When I get a chance I will complete it.

The idea is to create an archive that is extractable by feeding it to
/bin/sh, but which has a rigid enough format that a relatively simply
program written in C can also extract it, includes 32-bit CRCs for
eror-checking, can be split into pieces at arbitrary points and later
concatenated without worrying about message headers, and escapes tabs
and characters so they pass survive IBM mainframe-based networks.

Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/03/89)

In article <923@cirrusl.UUCP>, dhesi@sun505.UUCP (Rahul Dhesi) writes:
|  
|  The development of my "rap" archiver was suspended due to my move.
|  When I get a chance I will complete it.

  Since Rahul has a good reputation for being able to write portable
software, I think it would be good to wait and see this, as opposed to
having a number of standards.

  Rahul: if you want a copy of the latest shar2 for inspiration I'll
mail it to you. Since Rich has expressed dislike for it I suspect that
the source I submitted in March is not going to be posted, along with
all the subsequent stuff I packaged with it. You might be able to use
some of the file splitting or binary file code as a model.

-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

jm36+@andrew.cmu.edu (John Gardiner Myers) (10/04/89)

I came up with an archive format with a rigid format which could be
extracted by a relatively simple but secure unpacker and could also be
fed to /bin/sh.

The trick was to include a small C program in the archive to allow
those who hadn't obtained the unpacking program to extract the files.
A sample archive follows:

#! /bin/sh
# This is a mail archive.  To unpack it, use the 'unmar' program from
# comp.sources.unix.  Alternatively, you can remove anything before this
# line, then unpack it by saving it into a file and typing "sh file".
# Contents:  TEST
# Wrapped by jm36@beak.andrew.cmu.edu on Tue Oct  3 16:32:24 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
cat >,sunmar.c <<'EOF'
#include <stdio.h>
char *mystrchr(p,c) char *p; int c;
  { while (*p && *p != c) p++; return *p ? p : 0;}
main() {
  char *p, buf[4096]; FILE *ofile = NULL;
  while (fgets(buf, sizeof(buf), stdin)) {
    if (ofile) {
      if (!strcmp(buf, "END\n")) {
	fclose(ofile); ofile = NULL;
      } else fputs(buf+(buf[0]=='X'), ofile);
    } else {
      if (!strncmp(buf, "BEGIN ", 6)) {
	if (p = mystrchr(buf+6, ' ')) *p = '\0';
	if (!(ofile = fopen(buf+6, "w"))) {
	  perror(buf+6);
	} else printf("Extracting file %s\n", buf+6);
      } else if (!strncmp(buf, "DIRECTORY ", 10)) {
	if (p = mystrchr(buf+10, ' ')) *p = '\0';
	strncpy(buf+4, "mkdir", 5);
	system(buf+4);
      }}}}
EOF
cc -o ,sunmar ,sunmar.c
./,sunmar <<'END_OF_ARCHIVE'
BEGIN TEST - 16
XThis is a test.
END
END_ARCHIVE
END_OF_ARCHIVE
rm -f ,sunmar ,sunmar.c
exit 0

I haven't released this format because it would give any reasonable
implementation of "unshar" a severe case of indigestion.  The format
would only be worth releasing if it had a decent chance of becoming
more common than the shar format.  It would only become the standard
if the moderators of the sources groups adopted it.  The moderators
will only adopt it if it becomes the de-facto standard.  Catch-22.

-- 
_.John G. Myers		Internet: John.G.Myers@andrew.cmu.edu
(412) 268-2984		LoseNet:  ...!seismo!ihnp4!wiscvm.wisc.edu!give!up

john@chance.UUCP (John R. MacMillan) (10/04/89)

In article <2270@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes:
|In article <8910020054.AA08811@cscwam.UMD.EDU>, djm@wam.UMD.EDU writes:
|> This suggestion seems to be moving in the direction of making archives
|> that plain old /bin/sh can't unpack at all.  Perhaps it's not a bad
|> idea.
|
|Let's not forget why we use sharchives in the first place.
|The point was to have a format for distributing sources which could
|be used by people who HAVEN'T got any specialised "unshar".

That's why I suggested what I did; it still works for everyone who's
happy with shar format, and it makes it easier on people without
/bin/sh or who don't trust running /bin/sh on someone elses shars.
(I'm neither, by the way).
-- 
John R. MacMillan           "Don't you miss it...don't you miss it...
john@chance.UUCP             Some of you people just about missed it."
...!utcsri!hcr!chance!john        -- Talking Heads

allbery@NCoast.ORG (Brandon S. Allbery) (10/04/89)

As quoted from <2270@munnari.oz.au> by ok@cs.mu.oz.au (Richard O'Keefe):
+---------------
| In article <8910020054.AA08811@cscwam.UMD.EDU>, djm@wam.UMD.EDU writes:
| > This suggestion seems to be moving in the direction of making archives
| > that plain old /bin/sh can't unpack at all.  Perhaps it's not a bad
| > idea.  An easier to parse, more standardized pure-ASCII archiving
| 
| On the other hand, if you're interested in "more standardised" stuff,
| don't forget that ASCII is (a) a *national* standard, not an international
| one, and (b) superceded by the ISO 8859 family, and (c) a pain for BITNET
| mail links.  Your new format should let an MS-DOS-using donor mail text
| containing e-acute and other such characters to a MAC-using recipient
| with no harm resulting from an intermediate passage through EBCDIC.  Get
| _that_ right first, and then worry about shar.
+---------------

This sounds a lot like Brad's ABE, which is already in the c.s.misc archives;
and I think it can be configured to produce archives with a small dearchiver
prepended.

Of course, your proposed PC-to-Mac transfer will still have the small problem
that Apple and IBM disagree on where to put e-acute....

++Brandon
-- 
Brandon S. Allbery, moderator of comp.sources.misc	     allbery@NCoast.ORG
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
bsa@telotech.uucp, 161-7070 BALLBERY (MCI), ALLBERY (Delphi), B.ALLBERY (GEnie)
Is that enough addresses for you?   no?   then: allbery@uunet.UU.NET (c.s.misc)

peter@ficc.uu.net (Peter da Silva) (10/04/89)

The software tools archiver has apparently been suggested before. It can be
found in Volume 4 of comp.sources.unix.
-- 
Peter da Silva, *NIX support guy @ Ferranti International Controls Corporation.
Biz: peter@ficc.uu.net, +1 713 274 5180. Fun: peter@sugar.hackercorp.com. `-_-'
``I feel that any [environment] with users in it is "adverse".''           'U`
	-- Eric Peterson <lcc.eric@seas.ucla.edu>

jm36+@andrew.cmu.edu (John Gardiner Myers) (10/05/89)

tron!moran@umbc3 (Harvey R Moran) writes:
>    You have a decent idea, but your implementation leaves something to
> be desired. [...]
>    Your program assumes a working sh to prime it.  It also does a
> compile, one of the things which would raise my paranoia level.  Worse
> yet, it deletes the thing that was compiled so I "can't" see what was
> done.

You miss the point.  Anyone with half an interest in security would
use the "unmar" program which I would have published in
comp.sources.unix.  This program would ignore everything before the
first "BEGIN", could only create files and directories, would not
allow absolute pathnames and "..", would handle the "Part M of N"
foolishness, etc.  I believe the version I have is portable to
non-unix systems, but I haven't actually gone through the trouble of
beta-testing it.

The short C program in the archive is only for people who don't want
to hunt down the "unmar" program.  In that case, the format is no less
secure than the shar format.  People on systems where the compiler is
not invokable as "cc" can simply cut out the small program, compile
it themselves, and feed it the archive.

-- 
_.John G. Myers		Internet: John.G.Myers@andrew.cmu.edu
(412) 268-2984		LoseNet:  ...!seismo!ihnp4!wiscvm.wisc.edu!give!up

jmm@eci386.uucp (John Macdonald) (10/06/89)

In article <1989Oct3.225620.17825@chance.UUCP> john@chance.UUCP (John R. MacMillan) writes:

>That's why I suggested what I did; it still works for everyone who's
>happy with shar format, and it makes it easier on people without
>/bin/sh or who don't trust running /bin/sh on someone elses shars.
>(I'm neither, by the way).

I'm also neither - if I'm going to compile and run somebody elses C
program, that danger in also running their shar program to unpack it
seems minimal.  The same nasty trojan effects can be put in either
place by a dastardly villain, so closing the "sh" door does not do
much to improve safety.

Of course, I'm sufficiently rarely able to spend enough time on net
activities to both get a new set of source from the net and unpack
it and try to run it all in the same session.  Thus, I have the benefit
of expecting that by the time I *do* get around to trying something
out the lack of flames on the net implies a lack of trojans in the
source.  (Thank you to the brave pioneers who offer their file systems
up in sacrifice to the Trojan demons.  May your offerings never be
accepted.)
-- 
"Software and cathedrals are much the same -          | John Macdonald
first we build them, then we pray" (Sam Redwine)      |   jmm@eci386