xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (12/27/90)
tneff@bfmny0.BFM.COM (Tom Neff) writes: > karl@sugar.hackercorp.com (Karl Lehenbauer) writes: >>Enclosed is the help file for anyone who is having trouble unpacking Tcl. >If people would just post source to the source newsgroups, instead of >this unreadable binary crap, no help file would be necessary. Well, "unreadable" is a bit much, Karl was very helpful in email helping me find the tools to unpack tcl. The packaging was justified I think by the more than 50% savings in the size of the compressed, uuencoded file over the uncompressed original; tcl unpacks into nearly 1200 1K blocks of files. Lots of software doesn't transit the news system well in source form, even in shars; the extra long lines promoted by both C and awk programming styles, embedded control characters in the clear text version, and transit between EBCDIC and ASCII hosts can all cause unencoded files to be damaged by software problems in the news software (and one must be careful in the choice of uuencodes to survive the third danger intact). As the net becomes wider and the gateways more diverse, naked or shar-ed source has less and less chance of arriving intact, so probably more and more source files will transit the net in compressed encoded form as time goes on. No sense getting abusive about that. I don't think complaining about the packaging is fair if the product arrives intact because of it, but Karl's choice of cpio over tar was unfortunate. At any rate, as he indicated in his posting, the comp.sources.unix archive pax, in volume 17, does indeed allow compilation of a cpio (clone?) that successfully unpacks tcl; I just finished doing just that. Remember, almost nowhere on the net do the *.sources.* files arrive without having been compressed somewhere along the way; seeing them delivered to you in a compressed format merely defers the final unpacking to you, at some cost in convenience but benefit in size and robustness of transport. No one was going to eyeball that whole 1.2Mbytes plus packaging before deciding whether to save it off and unpack it in any case, and Karl did provide an introduction of sorts to the software's purpose. Kent, the man from xanth. <xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>
tneff@bfmny0.BFM.COM (Tom Neff) (12/27/90)
In article <1990Dec27.071632.7272@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes: >tneff@bfmny0.BFM.COM (Tom Neff) writes: >>If people would just post source to the source newsgroups, instead of >>this unreadable binary crap, no help file would be necessary. > >Well, "unreadable" is a bit much, Karl was very helpful in email helping >me find the tools to unpack tcl. Karl is a great guy, and this argument isn't intended to impugn his character in any way. By 'unreadable' I mean 'you cannot read it,' not 'you cannot somehow decode or transform it into something readable.' The following is readable: ------------------------------------------------------------ News is for text, and source newsgroups are for source text. ------------------------------------------------------------ The following is unreadable gibberish: ------------------------------------------------------------ begin 0 gibberish M'YV03LK<F0,B#4$S;^2 H%,&#QT6(,*X(0-BSILZ<L:4 >%&X)PS<B["(1A& .SD:$"BUBU+BP(1T7"@" end ------------------------------------------------------------ ...even though it's the same sentence uuencoded and compressed. >The packaging was justified I think by the more than 50% savings in the >size of the compressed, uuencoded file over the uncompressed original; >tcl unpacks into nearly 1200 1K blocks of files. This savings is illusory for most of Usenet. The final gibberish articles may occupy less space by themselves in the news spool directory than their appropriate, readable cleartext counterparts would, but that's all. Anyone with a compressed news feed (that's most of us) spent MORE, not less time, receiving them, as benchmarks published here have repeatedly shown. Anyone with a UNIX, DOS or VMS system will spend MORE, not less, disk space on intermediate files and such to do all the concatenating, decoding and decompressing necessary to turn the gibberish into real information than they would have simply extracting text from a shar. >Lots of software doesn't transit the news system well in source form, even >in shars; the extra long lines promoted by both C and awk programming >styles, embedded control characters in the clear text version, and transit >between EBCDIC and ASCII hosts can all cause unencoded files to be damaged >by software problems in the news software (and one must be careful in the >choice of uuencodes to survive the third danger intact). Sure, the phone book or a complete core dump of my system wouldn't transmit well over Usenet either. There is an issue of appropriateness here. Not EVERY piece of software in any form whatsoever, however willfully neglectful of net.user convenience, ought automatically to be considered suitable for posting to source newsgroups. Specifically, IF YOU CODE something you intend to post to the net, DON'T use superlong lines!! So what if C allows it, or even (as one might manage to convince oneself) 'promotes' it? The proliferation of net host environments DISCOURAGES it, and that ought to be the overriding consideration for software which aspires to worldwide distribution. > As the net becomes >wider and the gateways more diverse, naked or shar-ed source has less and >less chance of arriving intact, so probably more and more source files will >transit the net in compressed encoded form as time goes on. No sense getting >abusive about that. Leave 'abusive' out of it for a minute -- I am standing up for a principle, and owe it nothing less than my strongest advocacy. Nobody's a bad guy here. Next case. Yes, the net is more diverse, but resorting to various Captain Midnight coder-ring techniques to try and assure an 'intact' final file is a pyrrhic triumph! The transformations that news gateways perform have a PURPOSE, remember. Joe has a Macintosh system where his C source files all have little binary headers up front and a bunch of ^M-delimited text lines followed by binary \0's to fill out a sector boundary. (Just an example, not necessarily real.) Janet has an IBM CMS system where all her C source files are fixed-length-80 EBCDIC records. The news gateway between Joe and Janet's systems automatically transforms article text lines from one format to the other, so that both of them can read each other's cleartext happily. But now Joe gets this idea that the REALLY cool way to distribute source is as compressed uuencoded stuffit files which carefully 'preserve' all that precious, delicate Mac text structure -- after all, that's how his Mac BBS buddies prefer to exchange stuff -- so he publishes his new hello-world program, HELLO.C to comp.sources.foo as gibberish. After mucho email exchange, Janet gets hold of something to uncompress and uudecode the posting. What does she end up with? Another binary file with little ^M's sprinkled through it! The faithfully 'intact' Mac format is garbage to her! She now has to find or write something else to make into a real source file. Some convenience. The point is that an overly fussy attention to retaining the precise bitstream that appeared on the author's computer is MISPLACED. Material for multi-host distribution should be text. The precise file structure of that text should be permitted to vary normally from host to host. Where the material to be distributed has been written so as to make platform-independent representation impossible, the content should be questioned first! Somebody posted something here recently consisting of source code and documentation; the usual thing. But they put EPSON control codes in the doc file!! Little ESC-this and ESC-that scattered all over it for bold, underline etc. I mean, really now. Of course some host along the way obligingly stripped the \033 characters, leaving weird letters glued to the subheads. When this was pointed out in sources.d, what did someone suggest? You got it -- let's compress and uuencode the doc files, or heck, the whole thing!! BULL. Don't put silly escape sequences in your doc files in the first place! Use a portable representation like 'nroff -man' or stick to plaintext. Usenet is not an Epson peripheral. >Remember, almost nowhere on the net do the *.sources.* files arrive >without having been compressed somewhere along the way; seeing them >delivered to you in a compressed format merely defers the final >unpacking to you, at some cost in convenience but benefit in size and >robustness of transport. This would be true if the news batchers, recognizing that a particular set of articles was already compressed, could say 'OK skip this one, no compression needed.' But that's not how it works. All news articles are compressed on their way to you. Gibberish articles are compressed TWICE, for a net gain in transmission size and delay. Delivering gibberish articles doesn't *defer* unpacking to you: it *adds* another layer of unpacking which you must do. Nor is this more robust: a gibberish article in your spool directory is no likelier to be undamaged than a plaintext article. The difference is that when unpacking a plaintext shar (with word or line counting), you will discover than you have a bad "arraydef.h" and you must fix it yourself (often possible from context) or get a short replacement file or patch from the author; whereas a dropped or mangled line in a uuencoded compressed gibberish article brings everything to a screeching halt while you wait for tens or hundreds of K to be reposted before you can unpack. > No one was going to eyeball that whole >1.2Mbytes plus packaging before deciding whether to save it off and >unpack it in any case, and Karl did provide an introduction of sorts to >the software's purpose. It isn't necessary to 'eyeball' the whole 1.2MB in order to decide whether you want it. One can eyeball the first few components, or search for something specific you want or dislike (socket calls, VMS references, the regexp parser, etc). One can just scan to get a general idea of the code quality. These are real considerations. Perverting the source groups with encoded gibberish ignores them. It is wrong to piggyback an alien distribution scheme onto Usenet. -- "It has come to my attention that there is more !!! Tom Neff than one Jeffrey Miller." -- Jeffrey Miller ! ! tneff@bfmny0.BFM.COM
tneff@bfmny0.BFM.COM (Tom Neff) (12/28/90)
In article <1990Dec28.075123.6114@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes: >Uuencoded files have nice, regular, short lines, free of control >characters, that transit gateways and news software well. I don't want >to tell someone with a 132 character wide screen who's trying to decide >whether it's worth the pain and torment to publish their code for the >benefit of the net that s/he can only write code in the left 3/5ths or >so of the screen because the USENet news software is braindead. > >Allowing programmers to transport the code in a manner that will survive >the real world net without a prior hand reformat is a must. This seems to be where we disagree. I claim that *starting out* with a portable source format is far more in the interest of the net, than imposing magic preservation techniques designed to leave non-portable formats 'intact' for future headscratching. I don't deny for a minute that uuencoding is the only safe way to pass someone's 132-wide ^M-delimited C code through netnews. What I am saying is that such stuff SHOULDN'T BE PASSED! It's not generally useful. The more disparate the net gets, the LESS useful platform-specific source formats become. I also think the burden of portability should be on the shoulders of the author. It takes ONE session with a reformatter to render a program net-portable; it takes THOUSANDS of cumulative sessions, successful or otherwise, at thousands of user desks worldwide if we make the readers do the reformatting. It also promotes variant versions. >Moreover, uuencoded files of the more modern kind do a line by line >validity check, much more robust than shar's character count. I've >unpacked many damaged source files from the net that had correct >character counts, but damaged bytes in the files. This leads to >subtle and time consuming debugging, since you can easily get errors >that don't cause compiler errors by trashing just a byte or two, >especially if you get lucky and hit an operator and convert it to >a diffent operator. This is true, but again, a validation failure on a compress+uuencode posting hits everyone like a 16 ton weight! Nothing useful can be salvaged until the author reposts all or most of the archive. >The transit from ASCII to EBCDIC and back irreversably destroys some of >the bracket characters, I forget which ones. This is not a trivial >problem to fix in the source code. Sending the code with a uuencode >varient that avoids characters that don't exist in both character sets >avoids that damage. This is a problem with porting any C code to EBCDIC environments. Freeze-drying the original ASCII isn't going to be of particular help to the EBCDIC end-user. >The savings of 600Kbytes of spool storage space for tcl as sent means >about 300 news articles can escape the expire cleaver until that >distribution expires. On a small system like the home hobbiest system on >which I have an account, that is a great benefit. But there are other options for such home hobbyist systems, including running an aggressive expire on the source groups, or even doing local compression in-place on the articles (replacing the original '12345' files with small pointers to the real 12345.Z). >Your expressed concern is that the files do not meet the "USENet way" of >distributing source code. This is probably not a surprise to you, but >we're not just USENet any more; we have subscribers on BITNET, EUnet, >FidoNet, and many other networks, even CompuServe. No no noooo. We ARE Usenet -- by definition. Usenet is whoever gets news. Don't confuse it with Internet or other specific networks. We will always be Usenet. (Actually, Usenet plus Alternet plus the other non-core hierarchies, but whatever.) What's happening is that more and more disparate kinds of networks are becoming part of Usenet. They sport many architectural weirdnesses, but they all benefit from what should be the Usenet source style: short lines, no control characters, short hunks, simple (e.g. shar) collection mechanisms, no overloading lower layer functions (e.g. compression) onto the basic high level message envelope. > Getting source >material intact through all the possible protocals is a non-trivial >challenge, but the regularity and limited character set of uuencoded >data sure helps. Paying a penalty (around 10%) in communication time is >at least arguably worth it to be able to tie so much bigger a world >together. There are two paradigms at work here. One is someone writing BLACKJACK.C and wanting as many different people on as many different systems as possible all over the world to be able to get it, compile it and run it. The other is two PC wankers exchanging Sound Blaster samples across ten thousand miles of interconnecting networks which happen to know nothing about PC binary format. Usenet started out serving the former paradigm, but has been increasingly used to serve the latter. Whether it should is a policy matter. Encoding the source newsgroups should NOT be done unless ABSOLUTELY necessary. My concern is the growing frequency with which it is done UN-necessarily. -- "The country couldn't run without Prohibition. ][ Tom Neff That is the industrial fact." -- Henry Ford, 1929 ][ tneff@bfmny0.BFM.COM
xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (12/28/90)
Many of your points are good ones, however, I cannot let you so lightly dismiss a couple of them. Uuencoded files have nice, regular, short lines, free of control characters, that transit gateways and news software well. I don't want to tell someone with a 132 character wide screen who's trying to decide whether it's worth the pain and torment to publish their code for the benefit of the net that s/he can only write code in the left 3/5ths or so of the screen because the USENet news software is braindead. Allowing programmers to transport the code in a manner that will survive the real world net without a prior hand reformat is a must. Moreover, uuencoded files of the more modern kind do a line by line validity check, much more robust than shar's character count. I've unpacked many damaged source files from the net that had correct character counts, but damaged bytes in the files. This leads to subtle and time consuming debugging, since you can easily get errors that don't cause compiler errors by trashing just a byte or two, especially if you get lucky and hit an operator and convert it to a diffent operator. The transit from ASCII to EBCDIC and back irreversably destroys some of the bracket characters, I forget which ones. This is not a trivial problem to fix in the source code. Sending the code with a uuencode varient that avoids characters that don't exist in both character sets avoids that damage. The savings of 600Kbytes of spool storage space for tcl as sent means about 300 news articles can escape the expire cleaver until that distribution expires. On a small system like the home hobbiest system on which I have an account, that is a great benefit. With most traffic volume passing through the Internet, just now communications is not the overwhelming, all other consideration dismissing bottleneck to USENet that it was four years ago; disk space, however, is in even shorter supply; a posting in another newsgroup mentions that the net news volume doubles every eighteen months, which is faster than spinning storage grows cheaper by half. Attention to efficient storage methods in news spools is thus a valid and ongoing issue (in fact, I wish news were stored compressed and accessed with a pipe through zcat or by a "copy and uncompress to /temp, read, and discard the copy" strategy; I'd willingly pay the wait time to have a longer expire time), so receiving _and_ _storing_ until its expiration time a more space efficient format such as the compressed, uuencoded tcl distribution helps every site's expire times and helps avoid news spool overflow. Your expressed concern is that the files do not meet the "USENet way" of distributing source code. This is probably not a surprise to you, but we're not just USENet any more; we have subscribers on BITNET, EUnet, FidoNet, and many other networks, even CompuServe. Getting source material intact through all the possible protocals is a non-trivial challenge, but the regularity and limited character set of uuencoded data sure helps. Paying a penalty (around 10%) in communication time is at least arguably worth it to be able to tie so much bigger a world together. Like you, I prefer to see nice neat open ASCII shars posted, but I grow more and more willing to tolerate ever stranger formats as my own coping skills for them increase, especially when many of the options offered, such as damaged receipt, or news spool space crunches, are worse. Kent, the man from xanth. <xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>
sean@ms.uky.edu (Sean Casey) (12/29/90)
Why don't we vote on which method people prefer? Sean -- *** Sean Casey <sean@s.ms.uky.edu>
slamont@network.ucsd.edu (Steve Lamont) (12/29/90)
In article <sean.662422746@s.ms.uky.edu> sean@ms.uky.edu (Sean Casey) writes: >Why don't we vote on which method people prefer? ... because that would eliminate the fun of endless posting and counter posting. :-) spl (the p stands for probably new to the net...) -- Steve Lamont, SciViGuy -- 1882p@cc.nps.navy.mil -- a guest on network.ucsd.edu NPS Confuser Center / Code 51 / Naval Postgraduate School / Monterey, CA 93943 What is truth and what is fable, where is Ruth and where is Mabel? - Director/producer John Amiel, heard on NPR
allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (12/29/90)
As quoted from <1990Dec27.071632.7272@zorch.SF-Bay.ORG> by xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan): +--------------- | tneff@bfmny0.BFM.COM (Tom Neff) writes: | > karl@sugar.hackercorp.com (Karl Lehenbauer) writes: | >>Enclosed is the help file for anyone who is having trouble unpacking Tcl. | >If people would just post source to the source newsgroups, instead of | >this unreadable binary crap, no help file would be necessary. | | I don't think complaining about the packaging is fair if the product | arrives intact because of it, but Karl's choice of cpio over tar was | unfortunate. At any rate, as he indicated in his posting, the | comp.sources.unix archive pax, in volume 17, does indeed allow | compilation of a cpio (clone?) that successfully unpacks tcl; I just | finished doing just that. +--------------- There's also afio. Also, since you're so hip on getting savings, cpio doesn't waste space making sure everything is on a 512-byte boundary. (In a large archive, this *can* make a difference.) +--------------- | Remember, almost nowhere on the net do the *.sources.* files arrive | without having been compressed somewhere along the way; seeing them | delivered to you in a compressed format merely defers the final | unpacking to you, at some cost in convenience but benefit in size and | robustness of transport. No one was going to eyeball that whole +--------------- No benefit in size, I fear. Compress in a pipeline can't do its usual sanity checks; "compress file; uuencode file.Z file.Z | compress > file.uu.Z" actually results in file.uu.Z being considerably larger than file.Z, because an already- compressed file *expands* and uuencode also expands the file somewhat (but doesn't make it compressable again). Compressed uuencodes actually waste considerable space/time. I suspect that the only real solution to "robustness" issues will be X.400 or an equivalent. ++Brandon -- Me: Brandon S. Allbery VHF/UHF: KB8JRR on 220, 2m, 440 Internet: allbery@NCoast.ORG Packet: KB8JRR @ WA8BXN America OnLine: KB8JRR AMPR: KB8JRR.AmPR.ORG [44.70.4.88] uunet!usenet.ins.cwru.edu!ncoast!allbery Delphi: ALLBERY