[comp.sources.d] Read this if you're having trouble unpacking Tcl

xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (12/27/90)

tneff@bfmny0.BFM.COM (Tom Neff) writes:
> karl@sugar.hackercorp.com (Karl Lehenbauer) writes:

>>Enclosed is the help file for anyone who is having trouble unpacking Tcl.

>If people would just post source to the source newsgroups, instead of
>this unreadable binary crap, no help file would be necessary.

Well, "unreadable" is a bit much, Karl was very helpful in email helping
me find the tools to unpack tcl.

The packaging was justified I think by the more than 50% savings in the
size of the compressed, uuencoded file over the uncompressed original;
tcl unpacks into nearly 1200 1K blocks of files.

Lots of software doesn't transit the news system well in source form, even
in shars; the extra long lines promoted by both C and awk programming
styles, embedded control characters in the clear text version, and transit
between EBCDIC and ASCII hosts can all cause unencoded files to be damaged
by software problems in the news software (and one must be careful in the
choice of uuencodes to survive the third danger intact).  As the net becomes
wider and the gateways more diverse, naked or shar-ed source has less and
less chance of arriving intact, so probably more and more source files will 
transit the net in compressed encoded form as time goes on.  No sense getting
abusive about that.

I don't think complaining about the packaging is fair if the product
arrives intact because of it, but Karl's choice of cpio over tar was
unfortunate. At any rate, as he indicated in his posting, the
comp.sources.unix archive pax, in volume 17, does indeed allow
compilation of a cpio (clone?) that successfully unpacks tcl; I just
finished doing just that.

Remember, almost nowhere on the net do the *.sources.* files arrive
without having been compressed somewhere along the way; seeing them
delivered to you in a compressed format merely defers the final
unpacking to you, at some cost in convenience but benefit in size and
robustness of transport. No one was going to eyeball that whole
1.2Mbytes plus packaging before deciding whether to save it off and
unpack it in any case, and Karl did provide an introduction of sorts to
the software's purpose.

Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>

tneff@bfmny0.BFM.COM (Tom Neff) (12/27/90)

In article <1990Dec27.071632.7272@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes:
>tneff@bfmny0.BFM.COM (Tom Neff) writes:
>>If people would just post source to the source newsgroups, instead of
>>this unreadable binary crap, no help file would be necessary.
>
>Well, "unreadable" is a bit much, Karl was very helpful in email helping
>me find the tools to unpack tcl.

Karl is a great guy, and this argument isn't intended to impugn his
character in any way.  By 'unreadable' I mean 'you cannot read it,' not
'you cannot somehow decode or transform it into something readable.'

The following is readable:
------------------------------------------------------------
News is for text, and source newsgroups are for source text.
------------------------------------------------------------

The following is unreadable gibberish:
------------------------------------------------------------
begin 0 gibberish
M'YV03LK<F0,B#4$S;^2 H%,&#QT6(,*X(0-BSILZ<L:4 >%&X)PS<B["(1A&
.SD:$"BUBU+BP(1T7"@" 

end
------------------------------------------------------------
...even though it's the same sentence uuencoded and compressed.

>The packaging was justified I think by the more than 50% savings in the
>size of the compressed, uuencoded file over the uncompressed original;
>tcl unpacks into nearly 1200 1K blocks of files.

This savings is illusory for most of Usenet.  The final gibberish
articles may occupy less space by themselves in the news spool directory
than their appropriate, readable cleartext counterparts would, but
that's all.  Anyone with a compressed news feed (that's most of us)
spent MORE, not less time, receiving them, as benchmarks published here
have repeatedly shown.  Anyone with a UNIX, DOS or VMS system will spend
MORE, not less, disk space on intermediate files and such to do all the
concatenating, decoding and decompressing necessary to turn the
gibberish into real information than they would have simply extracting
text from a shar.

>Lots of software doesn't transit the news system well in source form, even
>in shars; the extra long lines promoted by both C and awk programming
>styles, embedded control characters in the clear text version, and transit
>between EBCDIC and ASCII hosts can all cause unencoded files to be damaged
>by software problems in the news software (and one must be careful in the
>choice of uuencodes to survive the third danger intact).  

Sure, the phone book or a complete core dump of my system wouldn't
transmit well over Usenet either.  There is an issue of appropriateness
here.  Not EVERY piece of software in any form whatsoever, however
willfully neglectful of net.user convenience, ought automatically to be
considered suitable for posting to source newsgroups.  Specifically, IF
YOU CODE something you intend to post to the net, DON'T use superlong
lines!!  So what if C allows it, or even (as one might manage to
convince oneself) 'promotes' it?  The proliferation of net host
environments DISCOURAGES it, and that ought to be the overriding
consideration for software which aspires to worldwide distribution.

>                                                          As the net becomes
>wider and the gateways more diverse, naked or shar-ed source has less and
>less chance of arriving intact, so probably more and more source files will 
>transit the net in compressed encoded form as time goes on.  No sense getting
>abusive about that.

Leave 'abusive' out of it for a minute -- I am standing up for a
principle, and owe it nothing less than my strongest advocacy.  Nobody's
a bad guy here.  Next case.

Yes, the net is more diverse, but resorting to various Captain Midnight
coder-ring techniques to try and assure an 'intact' final file is a
pyrrhic triumph!  The transformations that news gateways perform have a
PURPOSE, remember.

Joe has a Macintosh system where his C source files all have little
binary headers up front and a bunch of ^M-delimited text lines followed
by binary \0's to fill out a sector boundary.  (Just an example, not
necessarily real.)  Janet has an IBM CMS system where all her C source
files are fixed-length-80 EBCDIC records.  The news gateway between Joe
and Janet's systems automatically transforms article text lines from one
format to the other, so that both of them can read each other's
cleartext happily.  But now Joe gets this idea that the REALLY cool way
to distribute source is as compressed uuencoded stuffit files which
carefully 'preserve' all that precious, delicate Mac text structure --
after all, that's how his Mac BBS buddies prefer to exchange stuff -- so
he publishes his new hello-world program, HELLO.C to comp.sources.foo as
gibberish.  After mucho email exchange, Janet gets hold of something to
uncompress and uudecode the posting.  What does she end up with?
Another binary file with little ^M's sprinkled through it!  The
faithfully 'intact' Mac format is garbage to her!  She now has to find
or write something else to make into a real source file.  Some
convenience.

The point is that an overly fussy attention to retaining the precise
bitstream that appeared on the author's computer is MISPLACED.  Material
for multi-host distribution should be text.  The precise file structure
of that text should be permitted to vary normally from host to host.

Where the material to be distributed has been written so as to make
platform-independent representation impossible, the content should be
questioned first!  Somebody posted something here recently consisting of
source code and documentation; the usual thing.  But they put EPSON
control codes in the doc file!!  Little ESC-this and ESC-that scattered
all over it for bold, underline etc.  I mean, really now.  Of course
some host along the way obligingly stripped the \033 characters, leaving
weird letters glued to the subheads.  When this was pointed out in
sources.d, what did someone suggest?  You got it -- let's compress and
uuencode the doc files, or heck, the whole thing!!  BULL.  Don't put
silly escape sequences in your doc files in the first place!  Use a
portable representation like 'nroff -man' or stick to plaintext.  Usenet
is not an Epson peripheral.

>Remember, almost nowhere on the net do the *.sources.* files arrive
>without having been compressed somewhere along the way; seeing them
>delivered to you in a compressed format merely defers the final
>unpacking to you, at some cost in convenience but benefit in size and
>robustness of transport. 

This would be true if the news batchers, recognizing that a particular
set of articles was already compressed, could say 'OK skip this one, no
compression needed.'  But that's not how it works.  All news articles
are compressed on their way to you.  Gibberish articles are compressed
TWICE, for a net gain in transmission size and delay.  Delivering
gibberish articles doesn't *defer* unpacking to you: it *adds* another
layer of unpacking which you must do.  Nor is this more robust: a
gibberish article in your spool directory is no likelier to be undamaged
than a plaintext article.  The difference is that when unpacking a
plaintext shar (with word or line counting), you will discover than you
have a bad "arraydef.h" and you must fix it yourself (often possible
from context) or get a short replacement file or patch from the author;
whereas a dropped or mangled line in a uuencoded compressed gibberish
article brings everything to a screeching halt while you wait for tens
or hundreds of K to be reposted before you can unpack.

>                           No one was going to eyeball that whole
>1.2Mbytes plus packaging before deciding whether to save it off and
>unpack it in any case, and Karl did provide an introduction of sorts to
>the software's purpose.

It isn't necessary to 'eyeball' the whole 1.2MB in order to decide
whether you want it.  One can eyeball the first few components, or
search for something specific you want or dislike (socket calls, VMS
references, the regexp parser, etc).  One can just scan to get a general
idea of the code quality.  These are real considerations.  Perverting
the source groups with encoded gibberish ignores them.

It is wrong to piggyback an alien distribution scheme onto Usenet.

-- 
"It has come to my attention that there is more  !!!  Tom Neff
than one Jeffrey Miller." -- Jeffrey Miller      ! !  tneff@bfmny0.BFM.COM

tneff@bfmny0.BFM.COM (Tom Neff) (12/28/90)

In article <1990Dec28.075123.6114@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes:
>Uuencoded files have nice, regular, short lines, free of control
>characters, that transit gateways and news software well. I don't want
>to tell someone with a 132 character wide screen who's trying to decide
>whether it's worth the pain and torment to publish their code for the
>benefit of the net that s/he can only write code in the left 3/5ths or
>so of the screen because the USENet news software is braindead.
>
>Allowing programmers to transport the code in a manner that will survive
>the real world net without a prior hand reformat is a must.

This seems to be where we disagree.  I claim that *starting out* with a
portable source format is far more in the interest of the net, than
imposing magic preservation techniques designed to leave non-portable
formats 'intact' for future headscratching.

I don't deny for a minute that uuencoding is the only safe way to pass
someone's 132-wide ^M-delimited C code through netnews.  What I am
saying is that such stuff SHOULDN'T BE PASSED!  It's not generally
useful.  The more disparate the net gets, the LESS useful
platform-specific source formats become.

I also think the burden of portability should be on the shoulders of the
author.  It takes ONE session with a reformatter to render a program
net-portable; it takes THOUSANDS of cumulative sessions, successful or
otherwise, at thousands of user desks worldwide if we make the readers
do the reformatting.  It also promotes variant versions.

>Moreover, uuencoded files of the more modern kind do a line by line
>validity check, much more robust than shar's character count.  I've
>unpacked many damaged source files from the net that had correct
>character counts, but damaged bytes in the files.  This leads to
>subtle and time consuming debugging, since you can easily get errors
>that don't cause compiler errors by trashing just a byte or two,
>especially if you get lucky and hit an operator and convert it to
>a diffent operator.

This is true, but again, a validation failure on a compress+uuencode
posting hits everyone like a 16 ton weight!  Nothing useful can be
salvaged until the author reposts all or most of the archive.

>The transit from ASCII to EBCDIC and back irreversably destroys some of
>the bracket characters, I forget which ones. This is not a trivial
>problem to fix in the source code. Sending the code with a uuencode
>varient that avoids characters that don't exist in both character sets
>avoids that damage.

This is a problem with porting any C code to EBCDIC environments.
Freeze-drying the original ASCII isn't going to be of particular help to
the EBCDIC end-user.

>The savings of 600Kbytes of spool storage space for tcl as sent means
>about 300 news articles can escape the expire cleaver until that
>distribution expires. On a small system like the home hobbiest system on
>which I have an account, that is a great benefit. 

But there are other options for such home hobbyist systems, including
running an aggressive expire on the source groups, or even doing local
compression in-place on the articles (replacing the original '12345'
files with small pointers to the real 12345.Z).

>Your expressed concern is that the files do not meet the "USENet way" of
>distributing source code. This is probably not a surprise to you, but
>we're not just USENet any more; we have subscribers on BITNET, EUnet,
>FidoNet, and many other networks, even CompuServe. 

No no noooo.  We ARE Usenet -- by definition.  Usenet is whoever gets
news.  Don't confuse it with Internet or other specific networks.  We
will always be Usenet.  (Actually, Usenet plus Alternet plus the other
non-core hierarchies, but whatever.)  What's happening is that more and
more disparate kinds of networks are becoming part of Usenet.  They
sport many architectural weirdnesses, but they all benefit from what
should be the Usenet source style: short lines, no control characters,
short hunks, simple (e.g. shar) collection mechanisms, no overloading
lower layer functions (e.g. compression) onto the basic high level
message envelope.

>                                                     Getting source
>material intact through all the possible protocals is a non-trivial
>challenge, but the regularity and limited character set of uuencoded
>data sure helps.  Paying a penalty (around 10%) in communication time is
>at least arguably worth it to be able to tie so much bigger a world
>together.

There are two paradigms at work here.  One is someone writing
BLACKJACK.C and wanting as many different people on as many different
systems as possible all over the world to be able to get it, compile it
and run it.  The other is two PC wankers exchanging Sound Blaster
samples across ten thousand miles of interconnecting networks which
happen to know nothing about PC binary format.  Usenet started out
serving the former paradigm, but has been increasingly used to serve the
latter.  Whether it should is a policy matter.  Encoding the source
newsgroups should NOT be done unless ABSOLUTELY necessary.  My concern
is the growing frequency with which it is done UN-necessarily.

-- 
"The country couldn't run without Prohibition.       ][  Tom Neff
 That is the industrial fact." -- Henry Ford, 1929   ][  tneff@bfmny0.BFM.COM

xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (12/28/90)

Many of your points are good ones, however, I cannot let you so lightly
dismiss a couple of them.

Uuencoded files have nice, regular, short lines, free of control
characters, that transit gateways and news software well. I don't want
to tell someone with a 132 character wide screen who's trying to decide
whether it's worth the pain and torment to publish their code for the
benefit of the net that s/he can only write code in the left 3/5ths or
so of the screen because the USENet news software is braindead.

Allowing programmers to transport the code in a manner that will survive
the real world net without a prior hand reformat is a must.

Moreover, uuencoded files of the more modern kind do a line by line
validity check, much more robust than shar's character count.  I've
unpacked many damaged source files from the net that had correct
character counts, but damaged bytes in the files.  This leads to
subtle and time consuming debugging, since you can easily get errors
that don't cause compiler errors by trashing just a byte or two,
especially if you get lucky and hit an operator and convert it to
a diffent operator.

The transit from ASCII to EBCDIC and back irreversably destroys some of
the bracket characters, I forget which ones. This is not a trivial
problem to fix in the source code. Sending the code with a uuencode
varient that avoids characters that don't exist in both character sets
avoids that damage.

The savings of 600Kbytes of spool storage space for tcl as sent means
about 300 news articles can escape the expire cleaver until that
distribution expires. On a small system like the home hobbiest system on
which I have an account, that is a great benefit. With most traffic
volume passing through the Internet, just now communications is not the
overwhelming, all other consideration dismissing bottleneck to USENet
that it was four years ago; disk space, however, is in even shorter
supply; a posting in another newsgroup mentions that the net news volume
doubles every eighteen months, which is faster than spinning storage
grows cheaper by half.

Attention to efficient storage methods in news spools is thus a valid
and ongoing issue (in fact, I wish news were stored compressed and
accessed with a pipe through zcat or by a "copy and uncompress to /temp,
read, and discard the copy" strategy; I'd willingly pay the wait time to
have a longer expire time), so receiving _and_ _storing_ until its
expiration time a more space efficient format such as the compressed,
uuencoded tcl distribution helps every site's expire times and helps
avoid news spool overflow.

Your expressed concern is that the files do not meet the "USENet way" of
distributing source code. This is probably not a surprise to you, but
we're not just USENet any more; we have subscribers on BITNET, EUnet,
FidoNet, and many other networks, even CompuServe. Getting source
material intact through all the possible protocals is a non-trivial
challenge, but the regularity and limited character set of uuencoded
data sure helps.  Paying a penalty (around 10%) in communication time is
at least arguably worth it to be able to tie so much bigger a world
together.

Like you, I prefer to see nice neat open ASCII shars posted, but I grow
more and more willing to tolerate ever stranger formats as my own coping
skills for them increase, especially when many of the options offered,
such as damaged receipt, or news spool space crunches, are worse.

Kent, the man from xanth.
<xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>

sean@ms.uky.edu (Sean Casey) (12/29/90)

Why don't we vote on which method people prefer?

Sean

-- 
***  Sean Casey <sean@s.ms.uky.edu>

slamont@network.ucsd.edu (Steve Lamont) (12/29/90)

In article <sean.662422746@s.ms.uky.edu> sean@ms.uky.edu (Sean Casey) writes:
>Why don't we vote on which method people prefer?

... because that would eliminate the fun of endless posting and counter
posting. :-)

							spl (the p stands for
							probably new to the
							net...)
-- 
Steve Lamont, SciViGuy -- 1882p@cc.nps.navy.mil -- a guest on network.ucsd.edu
NPS Confuser Center / Code 51 / Naval Postgraduate School / Monterey, CA 93943
What is truth and what is fable, where is Ruth and where is Mabel?
                       - Director/producer John Amiel, heard on NPR

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (12/29/90)

As quoted from <1990Dec27.071632.7272@zorch.SF-Bay.ORG> by xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan):
+---------------
| tneff@bfmny0.BFM.COM (Tom Neff) writes:
| > karl@sugar.hackercorp.com (Karl Lehenbauer) writes:
| >>Enclosed is the help file for anyone who is having trouble unpacking Tcl.
| >If people would just post source to the source newsgroups, instead of
| >this unreadable binary crap, no help file would be necessary.
| 
| I don't think complaining about the packaging is fair if the product
| arrives intact because of it, but Karl's choice of cpio over tar was
| unfortunate. At any rate, as he indicated in his posting, the
| comp.sources.unix archive pax, in volume 17, does indeed allow
| compilation of a cpio (clone?) that successfully unpacks tcl; I just
| finished doing just that.
+---------------

There's also afio.  Also, since you're so hip on getting savings, cpio doesn't
waste space making sure everything is on a 512-byte boundary.  (In a large
archive, this *can* make a difference.)

+---------------
| Remember, almost nowhere on the net do the *.sources.* files arrive
| without having been compressed somewhere along the way; seeing them
| delivered to you in a compressed format merely defers the final
| unpacking to you, at some cost in convenience but benefit in size and
| robustness of transport. No one was going to eyeball that whole
+---------------

No benefit in size, I fear.  Compress in a pipeline can't do its usual sanity
checks; "compress file; uuencode file.Z file.Z | compress > file.uu.Z" actually
results in file.uu.Z being considerably larger than file.Z, because an already-
compressed file *expands* and uuencode also expands the file somewhat (but
doesn't make it compressable again).  Compressed uuencodes actually waste
considerable space/time.

I suspect that the only real solution to "robustness" issues will be X.400 or
an equivalent.

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY