[alt.sources.d] Read this if you're having trouble unpacking Tcl

karl@sugar.hackercorp.com (Karl Lehenbauer) (12/25/90)

Enclosed is the help file for anyone who is having trouble unpacking Tcl.

It has been pointed out to me that "cpio" is a SysV-ism.  However, "pax"
is around at various archives and can read these archives.  Most BSD systems
seem to already have some way to read cpio archives anyway.


You received this file because you had a problem unpacking the Tcl 4.0
archive.

This file should help you figure out what went wrong, and possibly
identify a piece of it that got corrupted.

Note that your editor must not strip trailing blanks or it will corrupt
the uuencoded archive.

If you have properly stripped the headers and trailers off the 12 parts,
they should have the following lengths:

-rw-r--r--   1 karl     cool       53281 Dec 19 19:02 tcl40.01
-rw-r--r--   1 karl     cool       53320 Dec 19 19:02 tcl40.02
-rw-r--r--   1 karl     cool       53320 Dec 19 19:02 tcl40.03
-rw-r--r--   1 karl     cool       53320 Dec 19 19:02 tcl40.04
-rw-r--r--   1 karl     cool       53320 Dec 19 19:02 tcl40.05
-rw-r--r--   1 karl     cool       53320 Dec 19 19:02 tcl40.06
-rw-r--r--   1 karl     cool       53320 Dec 19 19:02 tcl40.07
-rw-r--r--   1 karl     cool       53320 Dec 19 19:02 tcl40.08
-rw-r--r--   1 karl     cool       53320 Dec 19 19:02 tcl40.09
-rw-r--r--   1 karl     cool       53320 Dec 19 19:03 tcl40.10
-rw-r--r--   1 karl     cool       53320 Dec 19 19:03 tcl40.11
-rw-r--r--   1 karl     cool       46804 Dec 19 19:03 tcl40.12


After you concatenate them together, you should get a 633285-byte file.

-rw-r--r--   1 karl     cool      633285 Dec 19 19:03 uufile

after uudecode, tcl40.cpio.Z should be:

-rw-r--r--   1 karl     cool      459619 Dec 19 19:03 tcl40.cpio.Z

after "compress -d", tcl40.cpio should be:

-rw-r--r--   1 karl     cool     1191424 Dec 19 19:03 tcl40.cpio

-- 
-- uunet!sugar!karl
-- Usenet access: (713) 438-5018

tneff@bfmny0.BFM.COM (Tom Neff) (12/26/90)

In article <7372@sugar.hackercorp.com> karl@sugar.hackercorp.com (Karl Lehenbauer) writes:
>Enclosed is the help file for anyone who is having trouble unpacking Tcl.

If people would just post source to the source newsgroups, instead of
this unreadable binary crap, no help file would be necessary.

sean@ms.uky.edu (Sean Casey) (12/27/90)

tneff@bfmny0.BFM.COM (Tom Neff) writes:

|In article <7372@sugar.hackercorp.com> karl@sugar.hackercorp.com (Karl Lehenbauer) writes:
|>Enclosed is the help file for anyone who is having trouble unpacking Tcl.

|If people would just post source to the source newsgroups, instead of
|this unreadable binary crap, no help file would be necessary.

I went to all the trouble of editing off the headers and concatenating
the files and uudecoding the thing. Compress told me it was messed up
(but still uncompressed some of it). I figured hey I'll extract what I
can just to have a look. The man page for cpio is undecipherable, and
eventually figured out the right options. Unfortunately, cpio wouldn't
unpack *any* of it, telling me "Out of phase, get help." 

Now maybe Ken Arnold would like that sort of message, but it didn't
help me a whit.

If Tcl had been posted as a shell archive of source files as is the
norm, neither I nor anyone else on here would have had much trouble.

I agree with Tom. Just post source.

Sean
-- 
***  Sean Casey <sean@s.ms.uky.edu>

xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (12/27/90)

tneff@bfmny0.BFM.COM (Tom Neff) writes:
> karl@sugar.hackercorp.com (Karl Lehenbauer) writes:

>>Enclosed is the help file for anyone who is having trouble unpacking Tcl.

>If people would just post source to the source newsgroups, instead of
>this unreadable binary crap, no help file would be necessary.

Well, "unreadable" is a bit much, Karl was very helpful in email helping
me find the tools to unpack tcl.

The packaging was justified I think by the more than 50% savings in the
size of the compressed, uuencoded file over the uncompressed original;
tcl unpacks into nearly 1200 1K blocks of files.

Lots of software doesn't transit the news system well in source form, even
in shars; the extra long lines promoted by both C and awk programming
styles, embedded control characters in the clear text version, and transit
between EBCDIC and ASCII hosts can all cause unencoded files to be damaged
by software problems in the news software (and one must be careful in the
choice of uuencodes to survive the third danger intact).  As the net becomes
wider and the gateways more diverse, naked or shar-ed source has less and
less chance of arriving intact, so probably more and more source files will 
transit the net in compressed encoded form as time goes on.  No sense getting
abusive about that.

I don't think complaining about the packaging is fair if the product
arrives intact because of it, but Karl's choice of cpio over tar was
unfortunate. At any rate, as he indicated in his posting, the
comp.sources.unix archive pax, in volume 17, does indeed allow
compilation of a cpio (clone?) that successfully unpacks tcl; I just
finished doing just that.

Remember, almost nowhere on the net do the *.sources.* files arrive
without having been compressed somewhere along the way; seeing them
delivered to you in a compressed format merely defers the final
unpacking to you, at some cost in convenience but benefit in size and
robustness of transport. No one was going to eyeball that whole
1.2Mbytes plus packaging before deciding whether to save it off and
unpack it in any case, and Karl did provide an introduction of sorts to
the software's purpose.

Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>

tneff@bfmny0.BFM.COM (Tom Neff) (12/27/90)

In article <1990Dec27.071632.7272@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes:
>tneff@bfmny0.BFM.COM (Tom Neff) writes:
>>If people would just post source to the source newsgroups, instead of
>>this unreadable binary crap, no help file would be necessary.
>
>Well, "unreadable" is a bit much, Karl was very helpful in email helping
>me find the tools to unpack tcl.

Karl is a great guy, and this argument isn't intended to impugn his
character in any way.  By 'unreadable' I mean 'you cannot read it,' not
'you cannot somehow decode or transform it into something readable.'

The following is readable:
------------------------------------------------------------
News is for text, and source newsgroups are for source text.
------------------------------------------------------------

The following is unreadable gibberish:
------------------------------------------------------------
begin 0 gibberish
M'YV03LK<F0,B#4$S;^2 H%,&#QT6(,*X(0-BSILZ<L:4 >%&X)PS<B["(1A&
.SD:$"BUBU+BP(1T7"@" 
 
end
------------------------------------------------------------
...even though it's the same sentence uuencoded and compressed.

>The packaging was justified I think by the more than 50% savings in the
>size of the compressed, uuencoded file over the uncompressed original;
>tcl unpacks into nearly 1200 1K blocks of files.

This savings is illusory for most of Usenet.  The final gibberish
articles may occupy less space by themselves in the news spool directory
than their appropriate, readable cleartext counterparts would, but
that's all.  Anyone with a compressed news feed (that's most of us)
spent MORE, not less time, receiving them, as benchmarks published here
have repeatedly shown.  Anyone with a UNIX, DOS or VMS system will spend
MORE, not less, disk space on intermediate files and such to do all the
concatenating, decoding and decompressing necessary to turn the
gibberish into real information than they would have simply extracting
text from a shar.

>Lots of software doesn't transit the news system well in source form, even
>in shars; the extra long lines promoted by both C and awk programming
>styles, embedded control characters in the clear text version, and transit
>between EBCDIC and ASCII hosts can all cause unencoded files to be damaged
>by software problems in the news software (and one must be careful in the
>choice of uuencodes to survive the third danger intact).  

Sure, the phone book or a complete core dump of my system wouldn't
transmit well over Usenet either.  There is an issue of appropriateness
here.  Not EVERY piece of software in any form whatsoever, however
willfully neglectful of net.user convenience, ought automatically to be
considered suitable for posting to source newsgroups.  Specifically, IF
YOU CODE something you intend to post to the net, DON'T use superlong
lines!!  So what if C allows it, or even (as one might manage to
convince oneself) 'promotes' it?  The proliferation of net host
environments DISCOURAGES it, and that ought to be the overriding
consideration for software which aspires to worldwide distribution.

>                                                          As the net becomes
>wider and the gateways more diverse, naked or shar-ed source has less and
>less chance of arriving intact, so probably more and more source files will 
>transit the net in compressed encoded form as time goes on.  No sense getting
>abusive about that.

Leave 'abusive' out of it for a minute -- I am standing up for a
principle, and owe it nothing less than my strongest advocacy.  Nobody's
a bad guy here.  Next case.

Yes, the net is more diverse, but resorting to various Captain Midnight
coder-ring techniques to try and assure an 'intact' final file is a
pyrrhic triumph!  The transformations that news gateways perform have a
PURPOSE, remember.

Joe has a Macintosh system where his C source files all have little
binary headers up front and a bunch of ^M-delimited text lines followed
by binary \0's to fill out a sector boundary.  (Just an example, not
necessarily real.)  Janet has an IBM CMS system where all her C source
files are fixed-length-80 EBCDIC records.  The news gateway between Joe
and Janet's systems automatically transforms article text lines from one
format to the other, so that both of them can read each other's
cleartext happily.  But now Joe gets this idea that the REALLY cool way
to distribute source is as compressed uuencoded stuffit files which
carefully 'preserve' all that precious, delicate Mac text structure --
after all, that's how his Mac BBS buddies prefer to exchange stuff -- so
he publishes his new hello-world program, HELLO.C to comp.sources.foo as
gibberish.  After mucho email exchange, Janet gets hold of something to
uncompress and uudecode the posting.  What does she end up with?
Another binary file with little ^M's sprinkled through it!  The
faithfully 'intact' Mac format is garbage to her!  She now has to find
or write something else to make into a real source file.  Some
convenience.

The point is that an overly fussy attention to retaining the precise
bitstream that appeared on the author's computer is MISPLACED.  Material
for multi-host distribution should be text.  The precise file structure
of that text should be permitted to vary normally from host to host.

Where the material to be distributed has been written so as to make
platform-independent representation impossible, the content should be
questioned first!  Somebody posted something here recently consisting of
source code and documentation; the usual thing.  But they put EPSON
control codes in the doc file!!  Little ESC-this and ESC-that scattered
all over it for bold, underline etc.  I mean, really now.  Of course
some host along the way obligingly stripped the \033 characters, leaving
weird letters glued to the subheads.  When this was pointed out in
sources.d, what did someone suggest?  You got it -- let's compress and
uuencode the doc files, or heck, the whole thing!!  BULL.  Don't put
silly escape sequences in your doc files in the first place!  Use a
portable representation like 'nroff -man' or stick to plaintext.  Usenet
is not an Epson peripheral.

>Remember, almost nowhere on the net do the *.sources.* files arrive
>without having been compressed somewhere along the way; seeing them
>delivered to you in a compressed format merely defers the final
>unpacking to you, at some cost in convenience but benefit in size and
>robustness of transport. 

This would be true if the news batchers, recognizing that a particular
set of articles was already compressed, could say 'OK skip this one, no
compression needed.'  But that's not how it works.  All news articles
are compressed on their way to you.  Gibberish articles are compressed
TWICE, for a net gain in transmission size and delay.  Delivering
gibberish articles doesn't *defer* unpacking to you: it *adds* another
layer of unpacking which you must do.  Nor is this more robust: a
gibberish article in your spool directory is no likelier to be undamaged
than a plaintext article.  The difference is that when unpacking a
plaintext shar (with word or line counting), you will discover than you
have a bad "arraydef.h" and you must fix it yourself (often possible
from context) or get a short replacement file or patch from the author;
whereas a dropped or mangled line in a uuencoded compressed gibberish
article brings everything to a screeching halt while you wait for tens
or hundreds of K to be reposted before you can unpack.

>                           No one was going to eyeball that whole
>1.2Mbytes plus packaging before deciding whether to save it off and
>unpack it in any case, and Karl did provide an introduction of sorts to
>the software's purpose.

It isn't necessary to 'eyeball' the whole 1.2MB in order to decide
whether you want it.  One can eyeball the first few components, or
search for something specific you want or dislike (socket calls, VMS
references, the regexp parser, etc).  One can just scan to get a general
idea of the code quality.  These are real considerations.  Perverting
the source groups with encoded gibberish ignores them.

It is wrong to piggyback an alien distribution scheme onto Usenet.

-- 
"It has come to my attention that there is more  !!!  Tom Neff
than one Jeffrey Miller." -- Jeffrey Miller      ! !  tneff@bfmny0.BFM.COM

slamont@network.ucsd.edu (Steve Lamont) (12/28/90)

In article <97261639@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes:
>It isn't necessary to 'eyeball' the whole 1.2MB in order to decide
>whether you want it.  One can eyeball the first few components, or
>search for something specific you want or dislike (socket calls, VMS
>references, the regexp parser, etc).  ....

I don't really have much to add to Tom's discussion re: uuencoded vs.
cleartext files in source groups, other than I agree wholeheartedly with him
that it seems that cleartext is probably the way to go.

However, I'd like to make a small request of those generous souls who share
the fruits of their labors with the net.  That request is to please include a
reasonably full synopsis of what the posted source is supposed to do.  While I
can usually figure out what a piece of code does by some amount of study, it
would be preferable not to have to do so.  Even putting the man page as the
first shar'ed file would be helpful.

Thanx.

							spl (the p stands for
							pity the poor reader)
-- 
Steve Lamont, SciViGuy -- 1882p@cc.nps.navy.mil -- a guest on network.ucsd.edu
NPS Confuser Center / Code 51 / Naval Postgraduate School / Monterey, CA 93943
What is truth and what is fable, where is Ruth and where is Mabel?
                       - Director/producer John Amiel, heard on NPR

tneff@bfmny0.BFM.COM (Tom Neff) (12/28/90)

In article <1990Dec28.075123.6114@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes:
>Uuencoded files have nice, regular, short lines, free of control
>characters, that transit gateways and news software well. I don't want
>to tell someone with a 132 character wide screen who's trying to decide
>whether it's worth the pain and torment to publish their code for the
>benefit of the net that s/he can only write code in the left 3/5ths or
>so of the screen because the USENet news software is braindead.
>
>Allowing programmers to transport the code in a manner that will survive
>the real world net without a prior hand reformat is a must.


This seems to be where we disagree.  I claim that *starting out* with a
portable source format is far more in the interest of the net, than
imposing magic preservation techniques designed to leave non-portable
formats 'intact' for future headscratching.

I don't deny for a minute that uuencoding is the only safe way to pass
someone's 132-wide ^M-delimited C code through netnews.  What I am
saying is that such stuff SHOULDN'T BE PASSED!  It's not generally
useful.  The more disparate the net gets, the LESS useful
platform-specific source formats become.

I also think the burden of portability should be on the shoulders of the
author.  It takes ONE session with a reformatter to render a program
net-portable; it takes THOUSANDS of cumulative sessions, successful or
otherwise, at thousands of user desks worldwide if we make the readers
do the reformatting.  It also promotes variant versions.

>Moreover, uuencoded files of the more modern kind do a line by line
>validity check, much more robust than shar's character count.  I've
>unpacked many damaged source files from the net that had correct
>character counts, but damaged bytes in the files.  This leads to
>subtle and time consuming debugging, since you can easily get errors
>that don't cause compiler errors by trashing just a byte or two,
>especially if you get lucky and hit an operator and convert it to
>a diffent operator.

This is true, but again, a validation failure on a compress+uuencode
posting hits everyone like a 16 ton weight!  Nothing useful can be
salvaged until the author reposts all or most of the archive.

>The transit from ASCII to EBCDIC and back irreversably destroys some of
>the bracket characters, I forget which ones. This is not a trivial
>problem to fix in the source code. Sending the code with a uuencode
>varient that avoids characters that don't exist in both character sets
>avoids that damage.

This is a problem with porting any C code to EBCDIC environments.
Freeze-drying the original ASCII isn't going to be of particular help to
the EBCDIC end-user.

>The savings of 600Kbytes of spool storage space for tcl as sent means
>about 300 news articles can escape the expire cleaver until that
>distribution expires. On a small system like the home hobbiest system on
>which I have an account, that is a great benefit. 

But there are other options for such home hobbyist systems, including
running an aggressive expire on the source groups, or even doing local
compression in-place on the articles (replacing the original '12345'
files with small pointers to the real 12345.Z).

>Your expressed concern is that the files do not meet the "USENet way" of
>distributing source code. This is probably not a surprise to you, but
>we're not just USENet any more; we have subscribers on BITNET, EUnet,
>FidoNet, and many other networks, even CompuServe. 

No no noooo.  We ARE Usenet -- by definition.  Usenet is whoever gets
news.  Don't confuse it with Internet or other specific networks.  We
will always be Usenet.  (Actually, Usenet plus Alternet plus the other
non-core hierarchies, but whatever.)  What's happening is that more and
more disparate kinds of networks are becoming part of Usenet.  They
sport many architectural weirdnesses, but they all benefit from what
should be the Usenet source style: short lines, no control characters,
short hunks, simple (e.g. shar) collection mechanisms, no overloading
lower layer functions (e.g. compression) onto the basic high level
message envelope.

>                                                     Getting source
>material intact through all the possible protocals is a non-trivial
>challenge, but the regularity and limited character set of uuencoded
>data sure helps.  Paying a penalty (around 10%) in communication time is
>at least arguably worth it to be able to tie so much bigger a world
>together.

There are two paradigms at work here.  One is someone writing
BLACKJACK.C and wanting as many different people on as many different
systems as possible all over the world to be able to get it, compile it
and run it.  The other is two PC wankers exchanging Sound Blaster
samples across ten thousand miles of interconnecting networks which
happen to know nothing about PC binary format.  Usenet started out
serving the former paradigm, but has been increasingly used to serve the
latter.  Whether it should is a policy matter.  Encoding the source
newsgroups should NOT be done unless ABSOLUTELY necessary.  My concern
is the growing frequency with which it is done UN-necessarily.

-- 
"The country couldn't run without Prohibition.       ][  Tom Neff
 That is the industrial fact." -- Henry Ford, 1929   ][  tneff@bfmny0.BFM.COM

xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (12/28/90)

Many of your points are good ones, however, I cannot let you so lightly
dismiss a couple of them.

Uuencoded files have nice, regular, short lines, free of control
characters, that transit gateways and news software well. I don't want
to tell someone with a 132 character wide screen who's trying to decide
whether it's worth the pain and torment to publish their code for the
benefit of the net that s/he can only write code in the left 3/5ths or
so of the screen because the USENet news software is braindead.

Allowing programmers to transport the code in a manner that will survive
the real world net without a prior hand reformat is a must.

Moreover, uuencoded files of the more modern kind do a line by line
validity check, much more robust than shar's character count.  I've
unpacked many damaged source files from the net that had correct
character counts, but damaged bytes in the files.  This leads to
subtle and time consuming debugging, since you can easily get errors
that don't cause compiler errors by trashing just a byte or two,
especially if you get lucky and hit an operator and convert it to
a diffent operator.

The transit from ASCII to EBCDIC and back irreversably destroys some of
the bracket characters, I forget which ones. This is not a trivial
problem to fix in the source code. Sending the code with a uuencode
varient that avoids characters that don't exist in both character sets
avoids that damage.

The savings of 600Kbytes of spool storage space for tcl as sent means
about 300 news articles can escape the expire cleaver until that
distribution expires. On a small system like the home hobbiest system on
which I have an account, that is a great benefit. With most traffic
volume passing through the Internet, just now communications is not the
overwhelming, all other consideration dismissing bottleneck to USENet
that it was four years ago; disk space, however, is in even shorter
supply; a posting in another newsgroup mentions that the net news volume
doubles every eighteen months, which is faster than spinning storage
grows cheaper by half.

Attention to efficient storage methods in news spools is thus a valid
and ongoing issue (in fact, I wish news were stored compressed and
accessed with a pipe through zcat or by a "copy and uncompress to /temp,
read, and discard the copy" strategy; I'd willingly pay the wait time to
have a longer expire time), so receiving _and_ _storing_ until its
expiration time a more space efficient format such as the compressed,
uuencoded tcl distribution helps every site's expire times and helps
avoid news spool overflow.

Your expressed concern is that the files do not meet the "USENet way" of
distributing source code. This is probably not a surprise to you, but
we're not just USENet any more; we have subscribers on BITNET, EUnet,
FidoNet, and many other networks, even CompuServe. Getting source
material intact through all the possible protocals is a non-trivial
challenge, but the regularity and limited character set of uuencoded
data sure helps.  Paying a penalty (around 10%) in communication time is
at least arguably worth it to be able to tie so much bigger a world
together.

Like you, I prefer to see nice neat open ASCII shars posted, but I grow
more and more willing to tolerate ever stranger formats as my own coping
skills for them increase, especially when many of the options offered,
such as damaged receipt, or news spool space crunches, are worse.

Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>

tneff@bfmny0.BFM.COM (Tom Neff) (12/28/90)

In article <MEISSNER.90Dec28123513@curley.osf.org> meissner@osf.org (Michael Meissner) writes:
>On the other hand, ever since I've switched to compress + uuencode +
>shar for shipping out large sets of patches, I've had fewer people
>complain about news/mail trashing the file or subsequent patches
>failing, since some 'helpful' intermediary decided to put a newline in
>column 79, or change tabs to spaces, or....

This is not surprising at all, since the overwhelming urge on seeing one
of these massive little goobers is to 'K' the whole posting rather than
slog through unpacking it and possibly finding problems...


-- 
"How can a man of integrity get along    ///  Tom Neff
in Washington?" -- Richard Feynman      ///   tneff@bfmny0.BFM.COM

meissner@osf.org (Michael Meissner) (12/29/90)

In article <4410@network.ucsd.edu> slamont@network.ucsd.edu (Steve
Lamont) writes:

| In article <97261639@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes:
| >It isn't necessary to 'eyeball' the whole 1.2MB in order to decide
| >whether you want it.  One can eyeball the first few components, or
| >search for something specific you want or dislike (socket calls, VMS
| >references, the regexp parser, etc).  ....
| 
| I don't really have much to add to Tom's discussion re: uuencoded vs.
| cleartext files in source groups, other than I agree wholeheartedly with him
| that it seems that cleartext is probably the way to go.
| 
| However, I'd like to make a small request of those generous souls who share
| the fruits of their labors with the net.  That request is to please include a
| reasonably full synopsis of what the posted source is supposed to do.  While I
| can usually figure out what a piece of code does by some amount of study, it
| would be preferable not to have to do so.  Even putting the man page as the
| first shar'ed file would be helpful.

On the other hand, ever since I've switched to compress + uuencode +
shar for shipping out large sets of patches, I've had fewer people
complain about news/mail trashing the file or subsequent patches
failing, since some 'helpful' intermediary decided to put a newline in
column 79, or change tabs to spaces, or....
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?

slamont@network.ucsd.edu (Steve Lamont) (12/29/90)

In article <MEISSNER.90Dec28123513@curley.osf.org> meissner@osf.org (Michael Meissner) writes:
>On the other hand, ever since I've switched to compress + uuencode +
>shar for shipping out large sets of patches, I've had fewer people
>complain about news/mail trashing the file or subsequent patches
>failing, since some 'helpful' intermediary decided to put a newline in
>column 79, or change tabs to spaces, or....

Good point.  This isn't a religious issue with me, however, as a previous
poster pointed out, the whole world isn't running UN*X.  For postings that
make sense *only* in a UN*X environment, compress and uuencode clearly make
sense, since they're common within that environment.  For other sources that
have applicability to other environments (graphics ray tracers, for instance),
it would seem that a cleartext approach would achieve the widest
dissemination most easily.

My gut feeling is that rather than go to compression and uuencoding, for most
cases, fixing the shar wrapper to deal with long lines would be the most
straightforward answer.

Again, however, the reason why I've poked my nose into this issue is somewhat
orthogonal to the compression/nocompression discussion.  I'm more interested
in encouraging those who do post sources to include some cleartext explanatory
note identifying just what it is that they're posting.  On more than one
occasion, I've gone through the effort to unshar and even build something that
I found was of no interest.  I'm certain that I've passed by postings that
were of interest but the name was too cryptic for me to decipher.

Anyhow, thanks for the consideration.

							spl (the p stands for
							previous poster
							pointed??? I think
							I've just overdrawn my
							'p' ration for
							today...)
-- 
Steve Lamont, SciViGuy -- 1882p@cc.nps.navy.mil -- a guest on network.ucsd.edu
NPS Confuser Center / Code 51 / Naval Postgraduate School / Monterey, CA 93943
What is truth and what is fable, where is Ruth and where is Mabel?
                       - Director/producer John Amiel, heard on NPR

sean@ms.uky.edu (Sean Casey) (12/29/90)

Why don't we vote on which method people prefer?

Sean

-- 
***  Sean Casey <sean@s.ms.uky.edu>

slamont@network.ucsd.edu (Steve Lamont) (12/29/90)

In article <sean.662422746@s.ms.uky.edu> sean@ms.uky.edu (Sean Casey) writes:
>Why don't we vote on which method people prefer?

... because that would eliminate the fun of endless posting and counter
posting. :-)

							spl (the p stands for
							probably new to the
							net...)
-- 
Steve Lamont, SciViGuy -- 1882p@cc.nps.navy.mil -- a guest on network.ucsd.edu
NPS Confuser Center / Code 51 / Naval Postgraduate School / Monterey, CA 93943
What is truth and what is fable, where is Ruth and where is Mabel?
                       - Director/producer John Amiel, heard on NPR

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (12/29/90)

As quoted from <1990Dec27.071632.7272@zorch.SF-Bay.ORG> by xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan):
+---------------
| tneff@bfmny0.BFM.COM (Tom Neff) writes:
| > karl@sugar.hackercorp.com (Karl Lehenbauer) writes:
| >>Enclosed is the help file for anyone who is having trouble unpacking Tcl.
| >If people would just post source to the source newsgroups, instead of
| >this unreadable binary crap, no help file would be necessary.
| 
| I don't think complaining about the packaging is fair if the product
| arrives intact because of it, but Karl's choice of cpio over tar was
| unfortunate. At any rate, as he indicated in his posting, the
| comp.sources.unix archive pax, in volume 17, does indeed allow
| compilation of a cpio (clone?) that successfully unpacks tcl; I just
| finished doing just that.
+---------------

There's also afio.  Also, since you're so hip on getting savings, cpio doesn't
waste space making sure everything is on a 512-byte boundary.  (In a large
archive, this *can* make a difference.)

+---------------
| Remember, almost nowhere on the net do the *.sources.* files arrive
| without having been compressed somewhere along the way; seeing them
| delivered to you in a compressed format merely defers the final
| unpacking to you, at some cost in convenience but benefit in size and
| robustness of transport. No one was going to eyeball that whole
+---------------

No benefit in size, I fear.  Compress in a pipeline can't do its usual sanity
checks; "compress file; uuencode file.Z file.Z | compress > file.uu.Z" actually
results in file.uu.Z being considerably larger than file.Z, because an already-
compressed file *expands* and uuencode also expands the file somewhat (but
doesn't make it compressable again).  Compressed uuencodes actually waste
considerable space/time.

I suspect that the only real solution to "robustness" issues will be X.400 or
an equivalent.

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY

ed@dah.sub.org (Ed Braaten) (12/29/90)

In article <1990Dec28.075123.6114@zorch.SF-Bay.ORG> Kent Paul Dolan writes:

>Uuencoded files have nice, regular, short lines, free of control
>characters, that transit gateways and news software well. I don't want
>to tell someone with a 132 character wide screen who's trying to decide
>whether it's worth the pain and torment to publish their code for the
>benefit of the net that s/he can only write code in the left 3/5ths or
>so of the screen because the USENet news software is braindead.

That "braindead" USENET news software happens to connect a world 
full of disparate computer systems and their users.  It is this
connectivity that makes USENET so wonderful.  Good *source* files
have nice, regular, short lines, free of control characters, that
transit gateways and news software well. ;-)
 

--------------------------------------------------------------------
      Ed Braaten        |  "... Man looks at the outward appearance, 
Work: ed@de.intel.com   |  but the Lord looks at the heart."              
Home: ed@dah.sub.org    |                        1 Samuel 16:7b
--------------------------------------------------------------------

xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (12/30/90)

ed@dah.sub.org (Ed Braaten) writes:

> Kent Paul Dolan writes:

>> Uuencoded files have nice, regular, short lines, free of control
>> characters, that transit gateways and news software well. I don't
>> want to tell someone with a 132 character wide screen who's trying to
>> decide whether it's worth the pain and torment to publish their code
>> for the benefit of the net that s/he can only write code in the left
>> 3/5ths or so of the screen because the USENet news software is
>> braindead.

> That "braindead" USENET news software happens to connect a world full
> of disparate computer systems and their users. It is this connectivity
> that makes USENET so wonderful. Good *source* files have nice,
> regular, short lines, free of control characters, that transit
> gateways and news software well. ;-)

All as unarguable as motherhood and apple pie. Now you go tell Joe or
Suzie GreatSoftwareHacker that that spiffy 132 character wide terminal
s/he bought to write code is such a hazard to the net that we _insist_
s/he stop using the right 52 character positions so that _we_ aren't
inconvenienced dealing with the free efforts of his/her skullsweat. If I
were Kent GreatSoftwareHacker, I'd suggest you write the damn code
yourself, if you can't cope with my coding style, or tolerate posting
methods that will transport it.

(And notice I _do_ format my postings narrow to support followup
widening; I work on an 80 character screen myself; my last 132 character
wide screen was at a job I held in 1979.)

Fact is, software comes from folks with _lots_ of coding styles and
environments, and we either drop our parochialism, or do without their
code.

Back to the origin of this dispute, the tcl software as posted, packed
into about 392Kbytes as a single file in an lharc archive, 444Kbytes
with the files shredded out separately and packed in an lharc archive,
and something like 1200Kbytes (from "du") as an unarchived directory
tree in a BSD 4.3 file system, and probably 10% more than that as a
shar. If I weren't so fond of pulling out single files from an archive,
I'd start packing them with cpio or tar and lharcing that; 12% better
storage efficiency from saved header overhead, and from being able to
continue common string packing across files, isn't chickenfeed, when, as
here, you're talking about .6Gbyte of floppy disk archives.

As I noted in an earlier posting, it is a _large_ benefit to have the
sources sit in the news spool file at half the size, which was the case
with tcl as a uuencoded, compressed cpio archive.  The 600K saved is a
big chunk of our local, hmmm

Filesystem            kbytes    used   avail capacity  Mounted on
/dev/gd4f             231133   99129  108890    48%    /usr/spool/news

231Mbyte spool file; replicate that 20 or so times with other source
group postings, and you're talking a day saved for expires. I'd like that.


Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>

tneff@bfmny0.BFM.COM (Tom Neff) (12/31/90)

In article <1990Dec30.151724.20808@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes:
>All as unarguable as motherhood and apple pie. Now you go tell Joe or
>Suzie GreatSoftwareHacker that that spiffy 132 character wide terminal
>s/he bought to write code is such a hazard to the net that we _insist_
>s/he stop using the right 52 character positions so that _we_ aren't
>inconvenienced dealing with the free efforts of his/her skullsweat. 

They can use 700 columns if it makes them feel better, but when it comes
time to take something they've written and post it to Usenet, it's much
easier and more considerate for THEM to reformat it portably, ONCE, than
it is for thousands of cursing recipients to have to compensate for
their laziness after the fact.

Nor are UUENCODE-style subterfuges to preserve the precise original
bitstream through the netnews channel really a solution, since unless
the recipient's text architecture happens to match the author's, he has
TWO decode passes to try and make it through: one to undo the UUENCODE
and another to turn the decoded alien data into something he can
compile.  The hard work of the people behind the scenes who make this
second transformation happen transparently in his news gateway is thrown
away, of course, since UUENCODE deliverately denies the gateway access
to the real source text.

>If I
>were Kent GreatSoftwareHacker, I'd suggest you write the damn code
>yourself, if you can't cope with my coding style, or tolerate posting
>methods that will transport it.

Au contraire.  With all the source that's posted to the net every year,
a user can stuff his disk many times over JUST with what appears in
appropriate cleartext format.  When a huge glop of encoded gibberish
shows up in a "source" newsgroup, many people just sigh and 'K' it.  In
the Darwinian order of things, the cleartext will always tend to
prevail.  As someone else has pointed out, it is Netnews whose text
format interoperability should be improved, transparently to the user,
rather than promulgating egregious hacks at the expense of news
readability.

xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (01/01/91)

tneff@bfmny0.BFM.COM (Tom Neff) writes:
> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes:

>> All as unarguable as motherhood and apple pie. Now you go tell Joe or
>> Suzie GreatSoftwareHacker that that spiffy 132 character wide
>> terminal s/he bought to write code is such a hazard to the net that
>> we _insist_ s/he stop using the right 52 character positions so that
>> _we_ aren't inconvenienced dealing with the free efforts of his/her
>> skullsweat.

> They can use 700 columns if it makes them feel better, but when it
> comes time to take something they've written and post it to Usenet,
> it's much easier and more considerate for THEM to reformat it
> portably, ONCE, than it is for thousands of cursing recipients to have
> to compensate for their laziness after the fact.

Sorry?  You seem to be bringing a lot of emotion baggage, and not much
logic, to this discussion.

My compiler handles code wider than 80 columns just fine, as do my text
displayers and editors. Where's the problem? It is much easier for me to
unpack, join, and download a uuencoded split zoo or lharc archive than a
shar file, since I'm going to store the source in the latter format in
any case, and I have a much better chance of having it arrive intact and
prove itself to be intact. If part arrives munged, I seek it out from
another site, just as I would if part of a shar got clobbered. I don't
see any more effort on my part, and I do this several times a day.

> Nor are UUENCODE-style subterfuges to preserve the precise original
> bitstream through the netnews channel really a solution, since unless
> the recipient's text architecture happens to match the author's, he
> has TWO decode passes to try and make it through: one to undo the
> UUENCODE and another to turn the decoded alien data into something he
> can compile. The hard work of the people behind the scenes who make
> this second transformation happen transparently in his news gateway is
> thrown away, of course, since UUENCODE deliverately denies the gateway
> access to the real source text.

Well, the only case where I really have a problem is with source shar-ed
on an MS-DOS carriage return/ line feed style box and used on a newline
style box. This fails miserably in clear text, since the ever so helpful
intermediate sites change the carriage return/line feeds to newlines,
and now the shar character counts are all wrong, destroying any chance
to confirm even a limited way the correct transmission of the data. My
most recent unpleasent experience with this is all of 16 hours old, and
I am _not_ thrilled with the prospect of eyeballing a 32 part archive,
some 800K of source code, for transit damage because the transport
mechanism has _no_ remaining sanity checks. It is exactly the delightful
habit of uuencode of denying the gateway the chance to muck about with
the internals of the data that makes it most valuable. Any fool can
write a filter to convert carriage return/line feed pairs to newslines
in 20 minutes, tops, and it only needs doing once. This is a boogeyman,
not a real issue.

>> If I were Kent GreatSoftwareHacker, I'd suggest you write the damn
>> code yourself, if you can't cope with my coding style, or tolerate
>> posting methods that will transport it.

> Au contraire. With all the source that's posted to the net every year,
> a user can stuff his disk many times over JUST with what appears in
> appropriate cleartext format.

Perhaps I show a bit too little paranoia; when a multipart posting shows
up, I get as far as the README file, and never bother to look at the
rest before packing it up in a binary compressed archive and downloading
it for later processing; I'm having a great deal of trouble imagining
the person who sits and reads a source posting of several hundred
kilobytes _while_ in the form of a news article, and from the news
reader. That has to be the least convenient possible way to peruse the
source, since it is not broken out as files, even. In fact, I doubt any
such person exists, so why are we designing our source code transmission
paradigm for this non-existant, frankly ridiculous, strawman?

> When a huge glop of encoded gibberish shows up in a "source"
> newsgroup, many people just sigh and 'K' it.

I'm afraid you're showing a bit of parochialism there; the "source"
distributions in several groups are _always_ encoded; are you suggesting
that people reading those groups bypass all the code, or that people
reading the present clear text groups would do so if all the source were
encoded instead? I think not. I save the code that looks interesting to
me, in whatever format it chances to be forwarded, and nuke the rest,
even if all the serifs are polished by the author before transmission. I
just don't believe in the vast audience of helpless news subscribers you
evoke.

> In the Darwinian order of things, the cleartext will always tend to
> prevail.

Not even close; in the Darwinian order of things, the _intact_ text will
prevail.  As noted above, no one with sense reads all or even most of it
as a news article in any case.

> As someone else has pointed out, it is Netnews whose text format
> interoperability should be improved, transparently to the user, rather
> than promulgating egregious hacks at the expense of news readability.

As pointed out above several times, the "reader" you hold up as the
model being for whom we must transmit source as clear text is unlikely
to exist. If you have no better reason than this strawman for digging in
your heels against the tide of change, why not bow out of the discussion
gracefully?

Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>

meissner@osf.org (Michael Meissner) (01/02/91)

In article <39510310@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes:

| I don't deny for a minute that uuencoding is the only safe way to pass
| someone's 132-wide ^M-delimited C code through netnews.  What I am
| saying is that such stuff SHOULDN'T BE PASSED!  It's not generally
| useful.  The more disparate the net gets, the LESS useful
| platform-specific source formats become.

Pray tell how do you deal with the following situation (which I did
run into):

Site A posts something that has tabs in it, but is otherwise clean.

Site B (for bitnet) converts the tabs into spaces and passes it on.

Side C unpacks it, and it doesn't work at all, because make wants tabs
in front of the commands, or the next patch (which goes by way of site
D instead of B) doesn't apply because the lines don't match.

| I also think the burden of portability should be on the shoulders of the
| author.  It takes ONE session with a reformatter to render a program
| net-portable; it takes THOUSANDS of cumulative sessions, successful or
| otherwise, at thousands of user desks worldwide if we make the readers
| do the reformatting.  It also promotes variant versions.

Programs aren't the only things that are sent.  I've seen ASCII data
files which are not meant for human eyes, that have hundreds of
characters, and no form of continuation characters.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?

tneff@bfmny0.BFM.COM (Tom Neff) (01/02/91)

In article <MEISSNER.91Jan2004316@curley.osf.org> meissner@osf.org (Michael Meissner) writes:
>Pray tell how do you deal with the following situation (which I did
>run into):
>
>Site A posts something that has tabs in it, but is otherwise clean.
>Site B (for bitnet) converts the tabs into spaces and passes it on.
>Side C unpacks it, and it doesn't work at all, because make wants tabs
>in front of the commands, or the next patch (which goes by way of site
>D instead of B) doesn't apply because the lines don't match.

OK, that's actually two sub-situations with some common solutions.

First, it's a MAKE idiocy to require those leading tabs.  Hardly
sporting behavior for a dumb keyword oriented source language.  I know
that not all implementations still need it.  YOU tell ME how people on
tabless machines deal with them, eh?

But of course, it's no good preaching against prior art.  So how do
tab-hungry people survive on the other side of a tab-stripping news
link?  I'd say use a little C program that reinserts leading tabs.  It
ought to be so short you could include it right next to the Makefile.

Beyond that, I would think that people living in the above mentioned
environment would learn to use the '-l' switch for patch!  For ordinary
source files, whitespace alone should never cause a patch reject.

>Programs aren't the only things that are sent.  I've seen ASCII data
>files which are not meant for human eyes, that have hundreds of
>characters, and no form of continuation characters.

Oh yes, I agree.  The same thing goes for little bitmaps and other such
INTRINSICALLY binary data which must be included with a source
distribution in order to complete the packages.  By all means uuencode
those individual components within the shar.  But don't uuencode the
whole thing just for the sake of one or two auxiliary files.
-- 
Thank God for atheism!        8=8=8=8          Tom Neff / tneff@bfmny0.BFM.COM

rsalz@bbn.com (Rich Salz) (01/02/91)

In <1991Jan1.000829.24209@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes:
> I'm having a great deal of trouble imagining
>the person who sits and reads a source posting of several hundred
>kilobytes _while_ in the form of a news article, and from the news
>reader.

I subscribe to all the source groups I can find (since I have an account
on UUNET I can find a lot :-).  Using RN, I read at least the first full
article of any multi-part posting.

I don't it's all that rare.

The only group I dropped is the damn Amiga group, which posts binary
data in a source group.
	/r$
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.

sean@ms.uky.edu (Sean Casey) (01/03/91)

Personally, I feel that the problem is not with "encapsulating"
sources to prevent damage.

The problem is with the $$&@! inflexible operating systems and
gateways. By now, everyone should be able to do ascii. By now,
everyone's computer system should understand lines of 133 characters
or more.

Where the energy needs to be spent is getting these outdated systems
up to spec with the rest of the world. I say with all seriousness
"leave them behind". Then the vendors will get a clue and start fixing
things.

Whew, I feel better.

Sean
-- 
***  Sean Casey <sean@s.ms.uky.edu>

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (01/06/91)

Agreed with everything that Tom Neff says -- I rather prefer that
human-readable source remain human-readable when posted in Usenet
articles.

However, some sort of checksum should be included for verification.  (I
like my "brik" program, naturally, but the UNIX "sum" program also
works.)  Unfortunately most people don't include such a check code when
they post shar'd source, so de facto an archived-uuencoded posting is
better.
--
History never         |   Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
becomes obsolete.     |   UUCP:  oliveb!cirrusl!dhesi

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/06/91)

In article <2852@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
> Agreed with everything that Tom Neff says -- I rather prefer that
> human-readable source remain human-readable when posted in Usenet
> articles.

Yeah. If tabs and control characters and so on are a real problem, we
can easily define a text encoding that's both readable and safe. What
characters do we need? !@#$%^&*()-=_+[]{};':",./<>?\|`~, space, tab,
letters, numbers, and perhaps some other ASCII control characters. At
least letters, numbers, space, and !$%&*()-=+;:",./?\ will make it
through any translation. So we just define codes for the other
characters and be done with it. A machine without ~ can give an error if
it sees the code for ~. Binary files which inherently depend on the 256
bytes can be uuencoded. Done.

> However, some sort of checksum should be included for verification.

True. A cryptographic hash function, like Merkle's Snefru with 3 or 4
passes, would suffice.

> Unfortunately most people don't include such a check code when
> they post shar'd source,

But won't all this go into the shar program?

---Dan

oz@yunexus.yorku.ca (Ozan Yigit) (01/07/91)

In article <sean.662835867@s.ms.uky.edu> sean@ms.uky.edu (Sean Casey) writes:

>... By now, everyone should be able to do ascii.

Ascii? Or do you mean ISO Latin ... 

oz
---
Good design means less design. Design   | Internet: oz@nexus.yorku.ca 
must serve users, not try to fool them. | UUCP: utzoo/utai!yunexus!oz
-- Dieter Rams, Chief Designer, Braun.  | phonet: 1+ 416 736 5257
  

kyle@UUNET.UU.NET (Kyle Jones) (01/08/91)

Sean Casey writes:
 > ... By now, everyone should be able to do ascii.

Ozan Yigit writes:
 > Ascii? Or do you mean ISO Latin ... 

No, he probably means ASCII.  Some vendors STAY behind the times.

kris@beep.UUCP (Huh?) (01/09/91)

In article <sean.662835867@s.ms.uky.edu>, sean@ms.uky.edu (Sean Casey) writes:
 >The problem is with the $$&@! inflexible operating systems and
 >gateways. By now, everyone should be able to do ascii. By now,
 >everyone's computer system should understand lines of 133 characters
 >or more.

 >Where the energy needs to be spent is getting these outdated systems
 >up to spec with the rest of the world. I say with all seriousness
 >"leave them behind". Then the vendors will get a clue and start fixing
 >things.

     I disagree.  Part of USENET's appeal is its versatility, its ability
to "connect" different types of machines, not just the "elite" with money
to burn.  I say, 'make the software able to translate between all/most
machines and help those with "older" machines to come up with some sort
of translator'.  Some would call "beep" an "older system" (1985), but that
doesn't mean I'm not interested in what's out there, and I certainly don't
want to be cut out because the yuppies decide "beep" is a slum.

-- 
					Kris
					key!beep!kris -OR- woodowl!beep!kris

"For men without women
 are like fish without water to swim in;                       Was (Not Was)
 their eyes bugging out, they flop on the beach                "Shadow & Jimmy"
 and stare up at the girls who are just out of reach."