[comp.sources.d] Poll on shar formats

rsalz@bbn.com (Rich Salz) (06/01/88)

The shar stuff I just released only puts out the leading 'X' when
the first character is a non-alphabetic.  I decided to do this to
save space, and sort of as a compromise between shar's that put
lots of stuff (I particularly hate "<tab>X") and those that put
nothing, and just use cat.

As a minimum, for safety when going through mail systems, etc.,
the lines I protect need that protection.  (They shouldn't, but
such is reality.)

For example, manpages come out looking like this:

X.TH SHELL 1l
X.\" $Header: shell.man,v 2.0 88/05/27 13:28:55 rsalz Exp $
X.SH NAME
shell \- Interpreter for shell archives
X.SH SYNOPSIS
X.B shell
X[ file...  ]
X.SH DESCRIPTION
This program interprets enough UNIX shell syntax, and command usage,
to enable it to unpack many different types of UNIX shell archives,
or ``shar's.''
It is primarily intended to be used on non-UNIX systems that need to
unpack such archives.
X.PP
X.I Shell
does
X.B not
check for security holes, and will blithely execute commands like
X.RS
On text files, the savings is significant.

C code ends up looking like this:
X/*
X**  Return current working directory.  Something for everyone.
X*/
X/* LINTLIBRARY */
X#include "shar.h"
X#ifdef	RCSID
static char RCS[] =
X	"$Header: lcwd.c,v 2.0 88/05/27 13:26:24 rsalz Exp $";
X#endif	/* RCSID */
X
X
X#ifdef	PWDGETENV
X/* ARGSUSED */
char *
Cwd(p, i)
X    char	*p;
X    int		 i;
X{
X    char	*q;
X
X    return((q = getenv(PWDGETENV)) ? strcpy(p, q) : NULL);
X}
X#endif	/* PWDGETENV */

On C code the savings is usually miniscule.

Which do people prefer?  Putting X only where needed seems esthetically
pure (i.e., Gilmore's "Just the files, ma'am" quote from some time ago),
and can save space.  Putting a leading X all the time means things like
RN's [TAB] work nicely, and supposedly makes things easier on some
text editors (delete all first chars).  I suppose if you use unshar,
you've the text editor isn't an issue...

I'd like to see some discussion here, thanks.

	/rich $alz
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.

bob@acornrc.UUCP (Bob Weissman) (06/02/88)

In article <868@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes:
} The shar stuff I just released only puts out the leading 'X' when
} the first character is a non-alphabetic.  I decided to do this to
} ...
} Which do people prefer?  Putting X only where needed seems esthetically
} pure (i.e., Gilmore's "Just the files, ma'am" quote from some time ago),
} and can save space.  Putting a leading X all the time means things like
} RN's [TAB] work nicely, and supposedly makes things easier on some
} ...

I like to preview the commands a shell script is going to execute,
and "grep -v ^X" works nicely if all the included files' lines start
with X.

-- 
Bob Weissman
Internet:	bob@acornrc.uucp
UUCP:		...!{ ames | decwrl | oliveb | pyramid }!acornrc!bob
Arpanet:	bob%acornrc.uucp@ames.arc.nasa.gov

root@cca.ucsf.edu (Computer Center) (06/02/88)

In article <868@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes:
: The shar stuff I just released only puts out the leading 'X' when
: the first character is a non-alphabetic.  I decided to do this to
: ...
: I'd like to see some discussion here, thanks.
: 

Any line beginning with an X gets scrod.

It also messes up alignments if you are visually inspecting before
unshar-ing.

Please change it to be consistent.

Thanks,

Thos Sumner       (thos@cca.ucsf.edu)   BITNET:  thos@ucsfcca
(The I.G.)        (...ucbvax!ucsfcgl!cca.ucsf!thos)

OS|2 -- an Operating System for puppets.

#include <disclaimer.std>

obrien@aerospace.aero.org (Michael O'Brien) (06/03/88)

In article <1273@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Computer Center) writes:
>Any line beginning with an X gets scrod.
>Thos Sumner       (thos@cca.ucsf.edu)   BITNET:  thos@ucsfcca

Perfectly true.  Any line which already starts with an "X" will lose
it on extraction.  Inserting an "X" only before non-alphabetics leaves
the format ambiguous; the old format is unambiguous.  As far as
I'm concerned that should end the argument right there.

It might be that the new format could be fixed by also adding an
"X" to any line that already begins with one.  This seems unclean.
-- 
Mike O'Brien
obrien@aerospace.aero.org
{sdcrdcf,trwrb}!aero!obrien

danno@microsoft.UUCP (Daniel A. Norton) (06/03/88)

OK, so the "X" goes before any line with a ".".  I'm sure that the "X"
must also precede any line that already had an "X".  Are there other
characters in the first position that systems in the net will be
sensitive to?

One way to find out is to try it your (rs) way and see what pops up.
I'd prefer to hear more, however, from anyone else who might know if
any systems are sensitive to characters in the first position.
---
nortond@microsoft.UUCP	{decvax,decwrl,uw-beaver,hp-pcd}!microsoft!nortond

Daniel A. Norton		Any opinions expressed herein are strictly
c/o Microsoft Corp		mine,  and do not  represent  those  of my
Box 97017			employer  (who is probably unaware of this
Redmond, WA   98073-9717	message, anyway).

gsk@khaki.SGI.COM (George S. Kong) (06/03/88)

i'll vote for an X on every line.

X's only on some lines is very distracting if you're
just skimming the shar file, deciding whether or not to unpack it.

to me, always having the X seems much more esthetically pure.


George S. Kong,  Silicon Graphics, Inc.,  (415)962-3281
gsk@sgi.com
...{decwrl,allegra,sun,adobe,ucbvax,pyramid,ames}!sgi!gsk

karl@mstar.UUCP (Karl Fox) (06/03/88)

In article <868@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
>The shar stuff I just released only puts out the leading 'X' when
>the first character is a non-alphabetic.
...
>As a minimum, for safety when going through mail systems, etc.,
>the lines I protect need that protection.

Lines beginning with "From" will get a ">" stuck in front of them when
passing through some mail systems if the "X" is omitted.
-- 
Karl Fox, Morning Star Technologies ...!{att,cbosgd,osu-cis,pyramid}!mstar!karl

mark@jhereg.Jhereg.MN.ORG (Mark H. Colburn) (06/03/88)

In article <868@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
>The shar stuff I just released only puts out the leading 'X' when
>the first character is a non-alphabetic [...]  
>
What is going to happen if one of the lines in the shar files happens
to start with X. in the text?  For a contrived example:

X.TH ROUTE 8l
X.SH NAME
route \- X.400 Nameserver router
X.SH SYNOPSIS
X.B route
X.400-host	<------ This line should be "X.400-host, not ".400-host"
X.SH DESCRIPTION

I assume that shar would detect this and make it:

X.TH ROUTE 8l
X.SH NAME
route \- X.400 Nameserver router
X.SH SYNOPSIS
X.B route
XX.400-host
X.SH DESCRIPTION

By the way, it took me a bit to come up with something that would fit
into the category, it is not all that common a sequence.

-- 
Mark H. Colburn           mark@jhereg.Jhereg.MN.ORG
                          ..!ihnp4!chinet!jhereg!mark

jpn@teddy.UUCP (John P. Nelson) (06/03/88)

[Discussion of shar prefixing algorithms]

Back when I was mod.sources moderator, I modified my version of "shar"
to pre-scan each file for "dangerous" character sequences.  If no
"dangerous" sequences were found, then no prefixes were used at all,
and "cat" was used to extract the files, not "sed".  This both runs
faster, and makes it easier to extract the file when a "dumb" editor is
the only available means of extracting the files.

In a year and a half of moderating, not one file actually needed
prefixing.  Of course, I didn't repack any shars that I didn't need to,
but I ended up repacking about 1/4 of the submissions.  I did not get
ANY complaints from people about strange sequences causing corruption
of shars.

For some reason, people assume that a dot beginning a line is a
"dangerous sequence":  It is NOT!  What they are thinking of is a dot
ALONE on a line:  This causes some mailers to terminate reading the
mail.  It is silly to prefix EVERY line starting with a dot (nroff
source) because of this.

Other dangerous sequences are a line starting with "From", or a line
starting with the here-document end-of-file marker.  There may be
others, but I cannot recall any.  Note that a leading 'X' is not 
dangerous unless you are already using sed 's/^X//' to extract.

-- 
     john nelson

UUCP:	{decvax,mit-eddie}!genrad!teddy!jpn
smail:	jpn@genrad.com

darrylo@hpsrli.HP.COM (Darryl Okahata) (06/03/88)

In comp.sources.d, rsalz@bbn.com (Rich Salz) writes:

> The shar stuff I just released only puts out the leading 'X' when
> the first character is a non-alphabetic.  I decided to do this to
> save space, and sort of as a compromise between shar's that put
> lots of stuff (I particularly hate "<tab>X") and those that put
> nothing, and just use cat.
     [ ... ]
> 	/rich $alz
> -- 
> Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.

     I'd like to see "X"s on all lines, just to handle the (admittedly rare)
case of placing shar files within shar files.

     -- Darryl Okahata
	{hplabs!hpccc!, hpfcla!} hpsrla!darrylo
	CompuServe: 75206,3074

Disclaimer: the above is the author's personal opinion and is not the
opinion or policy of his employer or of the little green men that
have been following him all day.

rsalz@bbn.com (Rich Salz) (06/04/88)

 >For some reason, people assume that a dot beginning a line is a
 >"dangerous sequence":  It is NOT!  What they are thinking of is a dot
 >ALONE on a line:  This causes some mailers to terminate reading the
 >mail.  It is silly to prefix EVERY line starting with a dot (nroff
 >source) because of this.
This is true in the Arpa-mail and UUCP-mail worlds when all the programs
are working correctly.

Folks on bitnet have reported problems to me.

I've gotten bit by broken software.

An X on all lines seems to me playing lowest-common-denominator safety,
in the same league as <64K postings.  Unfortunately it has a cost.
	/r$
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.

kaufman@polya.UUCP (06/04/88)

In article <887@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:

>An X on all lines seems to me playing lowest-common-denominator safety,
>in the same league as <64K postings.  Unfortunately it has a cost.

Yes, but all the same, I would urge you NOT to do otherwise.  I have had a
*NIX sort-of-look-alike for some time that DID NOT have a working 'sh' shell.
All my unsharing has to be with a text editor or sed.  Complex prefix rules
are difficult to cope with if you cant run the 'official' tools.

If you are really concerned about all the extra characters sent, just remove
the comments to compensate [NO, NO, I don't mean that, really, ... its just
an example... I actually did have one person who took over maintenance of a
program I wrote delete all the comments 'because they made the assembly longer']

Marc Kaufman (kaufman@polya.stanford.edu)

jbuck@epimass.UUCP (06/05/88)

In article <887@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
>An X on all lines seems to me playing lowest-common-denominator safety,
>in the same league as <64K postings.  Unfortunately it has a cost.

When used with compress, the cost of the X is essentially zero,
because the sequence "\nX" in the shar file has the same frequency
that "\n" would if you used cat.  By making the statistical frequency
of patterns in the file more erratic, you might actually INCREASE the
size of compressed shar files with your scheme.

I asked in the past that shar format not be used for patches so the
patch program would work directly from news.  I didn't get my way
on that one.  Then, in patch #10 to patch (just out), Larry Wall
fixed patch to be able to work with patches with an X at the
beginning of each line.  Now you want to break that too!  Let's
keep things as simple as possible.

-- 
- Joe Buck  {uunet,ucbvax,pyramid,<smart-site>}!epimass.epi.com!jbuck
jbuck@epimass.epi.com	Old Arpa mailers: jbuck%epimass.epi.com@uunet.uu.net

lwall@devvax.UUCP (06/05/88)

In article <2993@polya.Stanford.EDU> kaufman@polya.Stanford.EDU (Marc T. Kaufman) writes:
: In article <887@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
: 
: >An X on all lines seems to me playing lowest-common-denominator safety,
: >in the same league as <64K postings.  Unfortunately it has a cost.
: 
: Yes, but all the same, I would urge you NOT to do otherwise.  I have had a
: *NIX sort-of-look-alike for some time that DID NOT have a working 'sh' shell.
: All my unsharing has to be with a text editor or sed.  Complex prefix rules
: are difficult to cope with if you cant run the 'official' tools.

Also, Rich, I just sent out a patch to patch so that it will extract patches
from shar files AS LONG AS the number of X's is consistent.  So you don't need
to apologize for sending out patches in shar files any more.

Larry "just-one-more-feature" Wall

lmb@vsi1.UUCP (Larry Blair) (06/05/88)

In article <2201@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes:
>When used with compress, the cost of the X is essentially zero,
>because the sequence "\nX" in the shar file has the same frequency
>that "\n" would if you used cat.  By making the statistical frequency
>of patterns in the file more erratic, you might actually INCREASE the
>size of compressed shar files with your scheme.

Joe has brought out a point that is often overlook in all discussions
of how to reduce net traffic.  Since it can be assumed that nearly
all net traffic is compressed, any changes to reduce the quantity of
data must be aimed at reducing the total *compressed* data.

I'm not forgetting that Rich's proposed shar format would save disk
space.  It's just that the amount saved would be unlikely to completely
fill a filesystem that wouldn't otherwise run out of room.
-- 
*   *   O     Larry Blair
  *   *   O   VICOM Systems Inc.     sun!pyramid----\
    *   *   O 2520 Junction Ave.     uunet!ubvax----->!vsi1!lmb
  *   *   O   San Jose, CA  95134    ucbvax!tolerant/
*   *   O     +1-408-432-8660

skl@van-bc.UUCP (Samuel Lam) (06/05/88)

In article <887@fig.bbn.com>, rsalz@bbn.com (Rich Salz) wrote:
>
> >For some reason, people assume that a dot beginning a line is a
> >"dangerous sequence":  It is NOT!  What they are thinking of is a dot
> >ALONE on a line:  This causes some mailers to terminate reading the
> >mail. ...
>
>This is true in the Arpa-mail and UUCP-mail worlds when all the programs
>are working correctly.
>
>Folks on bitnet have reported problems to me.

I have seen lines beginning with a dot (but with more stuff on it)
being treated as the end-of-message-body indicator by a *non*-BITNET
mailer as well.

Needless to say I think that mailer is broken, but getting someone
else's broken software fixed aren't necessarily a pleasant task
when it isn't on your machine, and the people who run that machine
think they are experts in electronic mail...

-- 
Samuel Lam     {ihnp4!alberta,watmath,uw-beaver,ubc-vision}!ubc-cs!van-bc!skl

rsalz@bbn.com (Rich Salz) (06/05/88)

For what my opinion is worth, based on the stuff I've read here and gotten
through the mail, I think there are only two ways to handle leading X's
(or whatever) on shar scripts:
	either Always do it
	or Never do it
Protecting only "dangerous" characters:
	Makes it harder for humans to follow
	Makes it hard to do "grep -v '^X'" to find trojan horses
	Fails to take advantage of the new feature in patch 
	Makes it harder on some hand unpackers
	Gains no space for those who do compressed batches
	Fails to note that not everyone knows what the pessimistic
	    set of "dangerous" characters are.
(Sorry about some of those sentences; I was going for parallel structure.)

From my experience as the moderator of a newsgroup (perennially
second-most-popular, sigh :-) that is gatewayed into a mailing list
received by a couple of hundred sites, the "Never do it" philosophy is
naive.

I haven't yet gotten to the point where I'll repackage anything that does
not come in an "Always do it" shar, but I'm getting there.  If only I had
the time...

Thanks, folks, for all the comments and feedback.
	/rich $alz
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.

glee@cognos.uucp (Godfrey Lee) (06/06/88)

In article <868@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
>The shar stuff I just released only puts out the leading 'X' when
>the first character is a non-alphabetic.

>Which do people prefer?

Just a caution, if you are going to change the shar scheme to be more
intelligent", please provide an unshar program that can cope with it. Shar is
used on PCs and other non-Unix systems that do not have /bin/sh.

The current version of unshar (the one I am using anyways) just counts
the number of characters in the "sed" parameter and chops that many
characters off the start of the line. In fact, it can't cope with the
shars that has a sed ... /^@/ and ends up chopping the first character
off every line. I know I can fix it, but those shar files are rare,
right now. Your scheme would certainly break the version of unshar I have.


-- 
Godfrey Lee                                      P.O. Box 9707
Cognos Incorporated                              3755 Riverside Dr.
VOICE:  (613) 738-1440   FAX: (613) 738-0002     Ottawa, Ontario
UUCP: decvax!utzoo!dciem!nrcaer!cognos!glee      CANADA  K1G 3Z4

rwl@uvacs.CS.VIRGINIA.EDU (Ray Lubinsky) (06/07/88)

In article <1494@microsoft.UUCP>, danno@microsoft.UUCP (Daniel A. Norton) writes:
> OK, so the "X" goes before any line with a ".".  I'm sure that the "X"
> must also precede any line that already had an "X".  Are there other
> characters in the first position that systems in the net will be
> sensitive to?

Well, with the BSD mail program, '~' (tilde) in the first column makes the
line a mail command (like "~s blah" to set the subject line).

How's about putting an 'X' in front of every line which begins with a
non-alphanumeric, non-space character (other than 'X'),
i.e. [^a-zA-Z0-9 \t\n]?

-- 
| Ray Lubinsky,                    UUCP:      ...!uunet!virginia!uvacs!rwl    |
| Department of                    BITNET:    rwl8y@virginia                  |
| Computer Science,                CSNET:     rwl@cs.virginia.edu  -OR-       |
| University of Virginia                      rwl%uvacs@uvaarpa.virginia.edu  |

levy@ttrdc.UUCP (Daniel R. Levy) (06/08/88)

In article <889@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes:
# For what my opinion is worth, based on the stuff I've read here and gotten
# through the mail, I think there are only two ways to handle leading X's
# (or whatever) on shar scripts:
# 	either Always do it
# 	or Never do it
# Protecting only "dangerous" characters:
# 	Makes it hard to do "grep -v '^X'" to find trojan horses

That's not foolproof.  Someone could package a trojan horse in an apparently
X-protected shell script very easily.  Use your imagination.
-- 
|------------Dan Levy------------|  THE OPINIONS EXPRESSED HEREIN ARE MINE ONLY
|    AT&T  Data Systems Group    |  Weinberg's Principle:  An expert is a
|        Skokie, Illinois        |  person who avoids the small errors while
|-----Path:  att!ttbcad!levy-----|  sweeping on to the grand fallacy.

dv@unicom.UUCP (Ivade Deviz @ Vern) (06/16/88)

One more advantage to putting 'X' at the beginning of every line is
that the rn command <TAB> command (Search for next line beginning with
a different character) works to find the end of the current source
file.  If you only put 'X' in front of certain lines, then <TAB> won't
skip to the end of the file, only to the next non-X line.
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
David W. Vezie, Systems Hacker         |  "I support Star Wars (tm),
{{sun,ucbvax}!pixar,pacbell}!unicom!dv |        it's SDI I can't stand"  --Me