rsalz@bbn.com (Rich Salz) (06/01/88)
The shar stuff I just released only puts out the leading 'X' when the first character is a non-alphabetic. I decided to do this to save space, and sort of as a compromise between shar's that put lots of stuff (I particularly hate "<tab>X") and those that put nothing, and just use cat. As a minimum, for safety when going through mail systems, etc., the lines I protect need that protection. (They shouldn't, but such is reality.) For example, manpages come out looking like this: X.TH SHELL 1l X.\" $Header: shell.man,v 2.0 88/05/27 13:28:55 rsalz Exp $ X.SH NAME shell \- Interpreter for shell archives X.SH SYNOPSIS X.B shell X[ file... ] X.SH DESCRIPTION This program interprets enough UNIX shell syntax, and command usage, to enable it to unpack many different types of UNIX shell archives, or ``shar's.'' It is primarily intended to be used on non-UNIX systems that need to unpack such archives. X.PP X.I Shell does X.B not check for security holes, and will blithely execute commands like X.RS On text files, the savings is significant. C code ends up looking like this: X/* X** Return current working directory. Something for everyone. X*/ X/* LINTLIBRARY */ X#include "shar.h" X#ifdef RCSID static char RCS[] = X "$Header: lcwd.c,v 2.0 88/05/27 13:26:24 rsalz Exp $"; X#endif /* RCSID */ X X X#ifdef PWDGETENV X/* ARGSUSED */ char * Cwd(p, i) X char *p; X int i; X{ X char *q; X X return((q = getenv(PWDGETENV)) ? strcpy(p, q) : NULL); X} X#endif /* PWDGETENV */ On C code the savings is usually miniscule. Which do people prefer? Putting X only where needed seems esthetically pure (i.e., Gilmore's "Just the files, ma'am" quote from some time ago), and can save space. Putting a leading X all the time means things like RN's [TAB] work nicely, and supposedly makes things easier on some text editors (delete all first chars). I suppose if you use unshar, you've the text editor isn't an issue... I'd like to see some discussion here, thanks. /rich $alz -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
bob@acornrc.UUCP (Bob Weissman) (06/02/88)
In article <868@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes: } The shar stuff I just released only puts out the leading 'X' when } the first character is a non-alphabetic. I decided to do this to } ... } Which do people prefer? Putting X only where needed seems esthetically } pure (i.e., Gilmore's "Just the files, ma'am" quote from some time ago), } and can save space. Putting a leading X all the time means things like } RN's [TAB] work nicely, and supposedly makes things easier on some } ... I like to preview the commands a shell script is going to execute, and "grep -v ^X" works nicely if all the included files' lines start with X. -- Bob Weissman Internet: bob@acornrc.uucp UUCP: ...!{ ames | decwrl | oliveb | pyramid }!acornrc!bob Arpanet: bob%acornrc.uucp@ames.arc.nasa.gov
root@cca.ucsf.edu (Computer Center) (06/02/88)
In article <868@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes:
: The shar stuff I just released only puts out the leading 'X' when
: the first character is a non-alphabetic. I decided to do this to
: ...
: I'd like to see some discussion here, thanks.
:
Any line beginning with an X gets scrod.
It also messes up alignments if you are visually inspecting before
unshar-ing.
Please change it to be consistent.
Thanks,
Thos Sumner (thos@cca.ucsf.edu) BITNET: thos@ucsfcca
(The I.G.) (...ucbvax!ucsfcgl!cca.ucsf!thos)
OS|2 -- an Operating System for puppets.
#include <disclaimer.std>
obrien@aerospace.aero.org (Michael O'Brien) (06/03/88)
In article <1273@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Computer Center) writes: >Any line beginning with an X gets scrod. >Thos Sumner (thos@cca.ucsf.edu) BITNET: thos@ucsfcca Perfectly true. Any line which already starts with an "X" will lose it on extraction. Inserting an "X" only before non-alphabetics leaves the format ambiguous; the old format is unambiguous. As far as I'm concerned that should end the argument right there. It might be that the new format could be fixed by also adding an "X" to any line that already begins with one. This seems unclean. -- Mike O'Brien obrien@aerospace.aero.org {sdcrdcf,trwrb}!aero!obrien
danno@microsoft.UUCP (Daniel A. Norton) (06/03/88)
OK, so the "X" goes before any line with a ".". I'm sure that the "X" must also precede any line that already had an "X". Are there other characters in the first position that systems in the net will be sensitive to? One way to find out is to try it your (rs) way and see what pops up. I'd prefer to hear more, however, from anyone else who might know if any systems are sensitive to characters in the first position. --- nortond@microsoft.UUCP {decvax,decwrl,uw-beaver,hp-pcd}!microsoft!nortond Daniel A. Norton Any opinions expressed herein are strictly c/o Microsoft Corp mine, and do not represent those of my Box 97017 employer (who is probably unaware of this Redmond, WA 98073-9717 message, anyway).
gsk@khaki.SGI.COM (George S. Kong) (06/03/88)
i'll vote for an X on every line. X's only on some lines is very distracting if you're just skimming the shar file, deciding whether or not to unpack it. to me, always having the X seems much more esthetically pure. George S. Kong, Silicon Graphics, Inc., (415)962-3281 gsk@sgi.com ...{decwrl,allegra,sun,adobe,ucbvax,pyramid,ames}!sgi!gsk
karl@mstar.UUCP (Karl Fox) (06/03/88)
In article <868@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes: >The shar stuff I just released only puts out the leading 'X' when >the first character is a non-alphabetic. ... >As a minimum, for safety when going through mail systems, etc., >the lines I protect need that protection. Lines beginning with "From" will get a ">" stuck in front of them when passing through some mail systems if the "X" is omitted. -- Karl Fox, Morning Star Technologies ...!{att,cbosgd,osu-cis,pyramid}!mstar!karl
mark@jhereg.Jhereg.MN.ORG (Mark H. Colburn) (06/03/88)
In article <868@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes: >The shar stuff I just released only puts out the leading 'X' when >the first character is a non-alphabetic [...] > What is going to happen if one of the lines in the shar files happens to start with X. in the text? For a contrived example: X.TH ROUTE 8l X.SH NAME route \- X.400 Nameserver router X.SH SYNOPSIS X.B route X.400-host <------ This line should be "X.400-host, not ".400-host" X.SH DESCRIPTION I assume that shar would detect this and make it: X.TH ROUTE 8l X.SH NAME route \- X.400 Nameserver router X.SH SYNOPSIS X.B route XX.400-host X.SH DESCRIPTION By the way, it took me a bit to come up with something that would fit into the category, it is not all that common a sequence. -- Mark H. Colburn mark@jhereg.Jhereg.MN.ORG ..!ihnp4!chinet!jhereg!mark
jpn@teddy.UUCP (John P. Nelson) (06/03/88)
[Discussion of shar prefixing algorithms] Back when I was mod.sources moderator, I modified my version of "shar" to pre-scan each file for "dangerous" character sequences. If no "dangerous" sequences were found, then no prefixes were used at all, and "cat" was used to extract the files, not "sed". This both runs faster, and makes it easier to extract the file when a "dumb" editor is the only available means of extracting the files. In a year and a half of moderating, not one file actually needed prefixing. Of course, I didn't repack any shars that I didn't need to, but I ended up repacking about 1/4 of the submissions. I did not get ANY complaints from people about strange sequences causing corruption of shars. For some reason, people assume that a dot beginning a line is a "dangerous sequence": It is NOT! What they are thinking of is a dot ALONE on a line: This causes some mailers to terminate reading the mail. It is silly to prefix EVERY line starting with a dot (nroff source) because of this. Other dangerous sequences are a line starting with "From", or a line starting with the here-document end-of-file marker. There may be others, but I cannot recall any. Note that a leading 'X' is not dangerous unless you are already using sed 's/^X//' to extract. -- john nelson UUCP: {decvax,mit-eddie}!genrad!teddy!jpn smail: jpn@genrad.com
darrylo@hpsrli.HP.COM (Darryl Okahata) (06/03/88)
In comp.sources.d, rsalz@bbn.com (Rich Salz) writes: > The shar stuff I just released only puts out the leading 'X' when > the first character is a non-alphabetic. I decided to do this to > save space, and sort of as a compromise between shar's that put > lots of stuff (I particularly hate "<tab>X") and those that put > nothing, and just use cat. [ ... ] > /rich $alz > -- > Please send comp.sources.unix-related mail to rsalz@uunet.uu.net. I'd like to see "X"s on all lines, just to handle the (admittedly rare) case of placing shar files within shar files. -- Darryl Okahata {hplabs!hpccc!, hpfcla!} hpsrla!darrylo CompuServe: 75206,3074 Disclaimer: the above is the author's personal opinion and is not the opinion or policy of his employer or of the little green men that have been following him all day.
rsalz@bbn.com (Rich Salz) (06/04/88)
>For some reason, people assume that a dot beginning a line is a >"dangerous sequence": It is NOT! What they are thinking of is a dot >ALONE on a line: This causes some mailers to terminate reading the >mail. It is silly to prefix EVERY line starting with a dot (nroff >source) because of this. This is true in the Arpa-mail and UUCP-mail worlds when all the programs are working correctly. Folks on bitnet have reported problems to me. I've gotten bit by broken software. An X on all lines seems to me playing lowest-common-denominator safety, in the same league as <64K postings. Unfortunately it has a cost. /r$ -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
kaufman@polya.UUCP (06/04/88)
In article <887@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes: >An X on all lines seems to me playing lowest-common-denominator safety, >in the same league as <64K postings. Unfortunately it has a cost. Yes, but all the same, I would urge you NOT to do otherwise. I have had a *NIX sort-of-look-alike for some time that DID NOT have a working 'sh' shell. All my unsharing has to be with a text editor or sed. Complex prefix rules are difficult to cope with if you cant run the 'official' tools. If you are really concerned about all the extra characters sent, just remove the comments to compensate [NO, NO, I don't mean that, really, ... its just an example... I actually did have one person who took over maintenance of a program I wrote delete all the comments 'because they made the assembly longer'] Marc Kaufman (kaufman@polya.stanford.edu)
jbuck@epimass.UUCP (06/05/88)
In article <887@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes: >An X on all lines seems to me playing lowest-common-denominator safety, >in the same league as <64K postings. Unfortunately it has a cost. When used with compress, the cost of the X is essentially zero, because the sequence "\nX" in the shar file has the same frequency that "\n" would if you used cat. By making the statistical frequency of patterns in the file more erratic, you might actually INCREASE the size of compressed shar files with your scheme. I asked in the past that shar format not be used for patches so the patch program would work directly from news. I didn't get my way on that one. Then, in patch #10 to patch (just out), Larry Wall fixed patch to be able to work with patches with an X at the beginning of each line. Now you want to break that too! Let's keep things as simple as possible. -- - Joe Buck {uunet,ucbvax,pyramid,<smart-site>}!epimass.epi.com!jbuck jbuck@epimass.epi.com Old Arpa mailers: jbuck%epimass.epi.com@uunet.uu.net
lwall@devvax.UUCP (06/05/88)
In article <2993@polya.Stanford.EDU> kaufman@polya.Stanford.EDU (Marc T. Kaufman) writes: : In article <887@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes: : : >An X on all lines seems to me playing lowest-common-denominator safety, : >in the same league as <64K postings. Unfortunately it has a cost. : : Yes, but all the same, I would urge you NOT to do otherwise. I have had a : *NIX sort-of-look-alike for some time that DID NOT have a working 'sh' shell. : All my unsharing has to be with a text editor or sed. Complex prefix rules : are difficult to cope with if you cant run the 'official' tools. Also, Rich, I just sent out a patch to patch so that it will extract patches from shar files AS LONG AS the number of X's is consistent. So you don't need to apologize for sending out patches in shar files any more. Larry "just-one-more-feature" Wall
lmb@vsi1.UUCP (Larry Blair) (06/05/88)
In article <2201@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes: >When used with compress, the cost of the X is essentially zero, >because the sequence "\nX" in the shar file has the same frequency >that "\n" would if you used cat. By making the statistical frequency >of patterns in the file more erratic, you might actually INCREASE the >size of compressed shar files with your scheme. Joe has brought out a point that is often overlook in all discussions of how to reduce net traffic. Since it can be assumed that nearly all net traffic is compressed, any changes to reduce the quantity of data must be aimed at reducing the total *compressed* data. I'm not forgetting that Rich's proposed shar format would save disk space. It's just that the amount saved would be unlikely to completely fill a filesystem that wouldn't otherwise run out of room. -- * * O Larry Blair * * O VICOM Systems Inc. sun!pyramid----\ * * O 2520 Junction Ave. uunet!ubvax----->!vsi1!lmb * * O San Jose, CA 95134 ucbvax!tolerant/ * * O +1-408-432-8660
skl@van-bc.UUCP (Samuel Lam) (06/05/88)
In article <887@fig.bbn.com>, rsalz@bbn.com (Rich Salz) wrote: > > >For some reason, people assume that a dot beginning a line is a > >"dangerous sequence": It is NOT! What they are thinking of is a dot > >ALONE on a line: This causes some mailers to terminate reading the > >mail. ... > >This is true in the Arpa-mail and UUCP-mail worlds when all the programs >are working correctly. > >Folks on bitnet have reported problems to me. I have seen lines beginning with a dot (but with more stuff on it) being treated as the end-of-message-body indicator by a *non*-BITNET mailer as well. Needless to say I think that mailer is broken, but getting someone else's broken software fixed aren't necessarily a pleasant task when it isn't on your machine, and the people who run that machine think they are experts in electronic mail... -- Samuel Lam {ihnp4!alberta,watmath,uw-beaver,ubc-vision}!ubc-cs!van-bc!skl
rsalz@bbn.com (Rich Salz) (06/05/88)
For what my opinion is worth, based on the stuff I've read here and gotten through the mail, I think there are only two ways to handle leading X's (or whatever) on shar scripts: either Always do it or Never do it Protecting only "dangerous" characters: Makes it harder for humans to follow Makes it hard to do "grep -v '^X'" to find trojan horses Fails to take advantage of the new feature in patch Makes it harder on some hand unpackers Gains no space for those who do compressed batches Fails to note that not everyone knows what the pessimistic set of "dangerous" characters are. (Sorry about some of those sentences; I was going for parallel structure.) From my experience as the moderator of a newsgroup (perennially second-most-popular, sigh :-) that is gatewayed into a mailing list received by a couple of hundred sites, the "Never do it" philosophy is naive. I haven't yet gotten to the point where I'll repackage anything that does not come in an "Always do it" shar, but I'm getting there. If only I had the time... Thanks, folks, for all the comments and feedback. /rich $alz -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
glee@cognos.uucp (Godfrey Lee) (06/06/88)
In article <868@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes: >The shar stuff I just released only puts out the leading 'X' when >the first character is a non-alphabetic. >Which do people prefer? Just a caution, if you are going to change the shar scheme to be more intelligent", please provide an unshar program that can cope with it. Shar is used on PCs and other non-Unix systems that do not have /bin/sh. The current version of unshar (the one I am using anyways) just counts the number of characters in the "sed" parameter and chops that many characters off the start of the line. In fact, it can't cope with the shars that has a sed ... /^@/ and ends up chopping the first character off every line. I know I can fix it, but those shar files are rare, right now. Your scheme would certainly break the version of unshar I have. -- Godfrey Lee P.O. Box 9707 Cognos Incorporated 3755 Riverside Dr. VOICE: (613) 738-1440 FAX: (613) 738-0002 Ottawa, Ontario UUCP: decvax!utzoo!dciem!nrcaer!cognos!glee CANADA K1G 3Z4
rwl@uvacs.CS.VIRGINIA.EDU (Ray Lubinsky) (06/07/88)
In article <1494@microsoft.UUCP>, danno@microsoft.UUCP (Daniel A. Norton) writes: > OK, so the "X" goes before any line with a ".". I'm sure that the "X" > must also precede any line that already had an "X". Are there other > characters in the first position that systems in the net will be > sensitive to? Well, with the BSD mail program, '~' (tilde) in the first column makes the line a mail command (like "~s blah" to set the subject line). How's about putting an 'X' in front of every line which begins with a non-alphanumeric, non-space character (other than 'X'), i.e. [^a-zA-Z0-9 \t\n]? -- | Ray Lubinsky, UUCP: ...!uunet!virginia!uvacs!rwl | | Department of BITNET: rwl8y@virginia | | Computer Science, CSNET: rwl@cs.virginia.edu -OR- | | University of Virginia rwl%uvacs@uvaarpa.virginia.edu |
levy@ttrdc.UUCP (Daniel R. Levy) (06/08/88)
In article <889@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes: # For what my opinion is worth, based on the stuff I've read here and gotten # through the mail, I think there are only two ways to handle leading X's # (or whatever) on shar scripts: # either Always do it # or Never do it # Protecting only "dangerous" characters: # Makes it hard to do "grep -v '^X'" to find trojan horses That's not foolproof. Someone could package a trojan horse in an apparently X-protected shell script very easily. Use your imagination. -- |------------Dan Levy------------| THE OPINIONS EXPRESSED HEREIN ARE MINE ONLY | AT&T Data Systems Group | Weinberg's Principle: An expert is a | Skokie, Illinois | person who avoids the small errors while |-----Path: att!ttbcad!levy-----| sweeping on to the grand fallacy.
dv@unicom.UUCP (Ivade Deviz @ Vern) (06/16/88)
One more advantage to putting 'X' at the beginning of every line is that the rn command <TAB> command (Search for next line beginning with a different character) works to find the end of the current source file. If you only put 'X' in front of certain lines, then <TAB> won't skip to the end of the file, only to the next non-X line. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= David W. Vezie, Systems Hacker | "I support Star Wars (tm), {{sun,ucbvax}!pixar,pacbell}!unicom!dv | it's SDI I can't stand" --Me