jeff@cullsj.UUCP (Jeffrey C. Fried) (05/01/88)
Currently there is an ongoing discussion in comp.binaries.ibm.pc.d concerned with establishing a standard for the exchange of software over the net. I would like to offer a suggestion. The following tools are available in source code format: COMPRESS (Lem-Ziv text compressor), ARITH (arithmetic compression for binary), UUencode/decode. Since all of these will run under a variety of environments (IBM-PC, AMIGA, ATARI, VMS, SYS5, BSD), why not make these the basis for communicating. I may have a public domain SHAR/UNSHAR program in C as well (i do have a text archiver in PASCAL as well that would suffice if PASCAL were acceptable to everyone). These should be enough to support everyone's needs, and because we have the source, it can be made available to everyone. I realize that some of you may have a favorite tool which you may feel surpasses the capabilities of these. I am making these suggestions to provide a starting point towards arriving at a mutually acceptable standard. For those who do not know, COMPRESS is a single file text compressor which works faster than any of the ARC clones, and, ARITH is something i constructed from a description in the ACM. ARITH compresses slower, but better, than huffman, ie., SQ/UNSQ. Most of all its in the public domain and i'll be posting source if enough people show an interest. In any case, let the discussion continue.
w8sdz@brl-smoke.ARPA (Keith B. Petersen ) (05/01/88)
Rather than discussing how to compress our files we should be discussing how to get them transferred error-free through the network. Uuencode/uudecode and compress/uncompress do no error checking. Think about this the next time you are tempted to uuencode a binary file. How do you know it will be received error-free by the recipient? At least when it is compressed by the ARC program a CRC of the original file is stored *inside* the ARC. It is checked when you extract the member file. The net spends thousands of dollars on reposts of truncated or otherwise munged files. Some of that money would be better spent on finding where the problem is and fixing it. A uuencode with CRC or checksum would go a long way towards finding the site(s) responsible for this waste. -- Keith Petersen Arpa: W8SDZ@SIMTEL20.ARPA Uucp: {bellcore,decwrl,harvard,lll-crg,ucbvax,uw-beaver}!simtel20.arpa!w8sdz GEnie: W8SDZ
wcf@psuhcx.psu.edu (Bill Fenner) (05/01/88)
Just one thing that needs to be known -- PC's can do no more than 12-bit compression. So if you are compressing your file from a UNIX system, you need to say comress -b12 filename . Bill -- __ _ _ _____ Bill Fenner Bitnet: wcf @ psuhcx.bitnet / ) // // / ' Internet: wcf @ hcx.psu.edu /--< o // // ,-/-, _ __ __ _ __ UUCP: ihnp4!psuvax1!psuhcx!wcf /___/_<_</_</_ (_/ </_/ <_/ <_</_/ (_ Fido: Sysop at 263/42
jpn@teddy.UUCP (John P. Nelson) (05/02/88)
>Just one thing that needs to be known -- PC's can do no more than 12-bit >compression. So if you are compressing your file from a UNIX system, >you need to say comress -b12 filename . This myth has been repeated several times, so I felt it was necessary to speak up. PCs most certainly CAN do a 16 bit compress/uncompress. It takes 512K of available memory to run, and you also either need a compiler that supports HUGE model arrays, or else you have to manually break up the buffer space into multiple 64K arrays (this is what the version I have does - The port was done a couple of years ago for XENIX, but it works just fine under MSDOS as well). -- john nelson UUCP: {decvax,mit-eddie}!genrad!teddy!jpn ARPA (sort of): talcott.harvard.edu!panda!teddy!jpn
rsalz@bbn.com (Rich Salz) (05/02/88)
If you are (sigh) going to post binaries on Usenet, DO NOT compress them first. Many Usenet sites use compress to pack up their news batches. Compressing a compressed file makes it larger. -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
wcf@psuhcx.psu.edu (Bill Fenner) (05/02/88)
In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes: >>Just one thing that needs to be known -- PC's can do no more than 12-bit >>compression. So if you are compressing your file from a UNIX system, >This myth has been repeated several times, so I felt it was necessary to >speak up. PCs most certainly CAN do a 16 bit compress/uncompress. It >takes 512K of available memory to run, and you also either need a compiler Hard as it is to believe, a lot of people don't have 640k computers... But, I think that this utility would do well to be distributed... mind posting it on comp.binaries.ibm.pc? (Can it do 12-bit also?) Thanks -- __ _ _ _____ Bill Fenner Bitnet: wcf @ psuhcx.bitnet / ) // // / ' Internet: wcf @ hcx.psu.edu /--< o // // ,-/-, _ __ __ _ __ UUCP: ihnp4!psuvax1!psuhcx!wcf /___/_<_</_</_ (_/ </_/ <_/ <_</_/ (_ Fido: Sysop at 263/42
norman@oravax.UUCP (Norman Ramsey) (05/03/88)
Someone mentioned that uuencode/decode do no error checking. There is a program called btoa/atob that converts binary to ascii and back again. It is more efficient than uuencode/decode (4 bytes binary go to 5 bytes ascii) and has a checksum built in. The programs are quite short and I have ported them to the IBM PC no problem. At my site at least they came with compress 4.0, which itself came with TeX, so I assume they are public domain. Most frequently I use them with a script called `tarmail' which is essentially tar | compress | btoa | split -700 | mail, where we actually mail things across the net in 700-line chunks. There is an `untarmail' at the other end which strips off the headers and (if there are no errors) does the uncompress, the tar x, et cetera. I'm sure a `tarpost' could be put together with little or no difficulty. Norman Ramsey norman%oravax.uucp@cu-arpa.cs.cornell.edu
jeff@cullsj.UUCP (Jeffrey C. Fried) (05/03/88)
In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes: > Just one thing that needs to be known -- PC's can do no more than 12-bit > compression. So if you are compressing your file from a UNIX system, > you need to say comress -b12 filename . I've constructed a version of COMPRESS using 13 bits and the small model by making only one array large. I've also constructed a version in BIG mode which runs at half the speed and compress only 10 better using the full addressing used under UNIX.
wcf@psuhcx.psu.edu (Bill Fenner) (05/03/88)
In article <696@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes: >If you are (sigh) going to post binaries on Usenet, DO NOT compress >them first. Many Usenet sites use compress to pack up their news >batches. Compressing a compressed file makes it larger. We've gone through this before, and it has never been explained to my satisfaction. I think you do save something by compressing a uuencoded compressed file over compressing the uuencoded uncompressed file. I just did a test. The file I used may not have been a good 'average binary' (I used a moria save character - the best I could find on short notice). Anyway... Origional size: (cannot send; it's binary): 95,348 bytes Compressed (also cannot send; also binary): 6,772 bytes Now... UUEncoded then compressed (the amount that would be transmitted if you simply uuencode the file) : 11,531 bytes And the kicker... compressed, UUEncoded, then compressed (as if you compressed it, then uuencoded it, then posted it, then the news will compress it) : 9009 bytes. Like I said, this may not have been a proper 'average binary'. I am going to write a shell script to check all these things, and run it on several actual PC binaries and ARC files. I will post the results to comp.binaries.ibm.pc.d. -- __ _ _ _____ Bill Fenner Bitnet: wcf @ psuhcx.bitnet / ) // // / ' Internet: wcf @ hcx.psu.edu /--< o // // ,-/-, _ __ __ _ __ UUCP: ihnp4!psuvax1!psuhcx!wcf /___/_<_</_</_ (_/ </_/ <_/ <_</_/ (_ Fido: Sysop at 263/42
loci@csccat.UUCP (Chuck Brunow) (05/03/88)
In article <55@psuhcx.psu.edu> wcf@psuhcx (Bill Fenner) writes: >Just one thing that needs to be known -- PC's can do no more than 12-bit >compression. So if you are compressing your file from a UNIX system, >you need to say comress -b12 filename . > Are you quite sure about that? 13-bit compress will run on other 64k segment machines (80?86 based).
jpn@teddy.UUCP (John P. Nelson) (05/03/88)
>If you are (sigh) going to post binaries on Usenet, DO NOT compress >them first. Many Usenet sites use compress to pack up their news >batches. Compressing a compressed file makes it larger. This is incorrect. I hope I can clear this up once and for all: If you have ascii files (like source or documentation), then it is true that compressing, then uuencoding is a BAD IDEA, even though the posting appears to be smaller than the cleartext. That is because when the file is compressed again, it will be larger than the cleartext after IT is compressed. If you have a binary file that MUST be uuencoded to be posted, then compression before uuencoding IS HELPFUL! Most files that are compressed, then uuencoded, then compressed again are signficantly smaller than files that are simply uuencoded, then compressed once! I think that the reason this is true is that uuencoding tends to interfere with the compression process. By the way, compressing a uuencoded file almost always results in a small reduction in size. When I say "compressed", I include archival programs such as ARC and ZOO. These conclusions were reached by experimental evidence (I didn't conduct the experiments, others did, and they posted their results). Perhaps no one bothered to read these informative articles, (or else my suspicion is true: the maximum long-term memory of the average USENET reader is no more than 1 month long). -- john nelson UUCP: {decvax,mit-eddie}!genrad!teddy!jpn ARPA (sort of): talcott.harvard.edu!panda!teddy!jpn
wtr@moss.ATT.COM (05/04/88)
In article <55@psuhcx.psu.edu> wcf@psuhcx (Bill Fenner) writes: >Just one thing that needs to be known -- PC's can do no more than >12-bit compression. So if you are compressing your file from a >UNIX system, you need to say comress -b12 filename. Sorry, but.... The posted patches for compress(4.0) to get it running on a Microport SV/AT (286) system have the patches to allow full 16-bit compression on a 16-bit PC (an AT clone, in this case) It should not prove too difficult to apply these patches to any other 16-bit System (read: MS-DOS). { No flames intended, it took me ~1 month to figure out the 12/16 bit problem and locate the patches } Personally, I use cpio & compress to move files. I don't care about execution time, rather transmission time is my most important consideration, and so I desire the highest compression ratio I can find. I agree that for "real time" communication, compress is totally inaddiquate because of it's processing needs. ===================================================================== Bill Rankin Bell Labs, Whippany NJ (201) 386-4154 (cornet 232) email address: ...![ ihnp4 ulysses cbosgd allegra ]!moss!wtr ...![ ihnp4 cbosgd akgua watmath ]!clyde!wtr =====================================================================
egisin@watmath.waterloo.edu (Eric Gisin) (05/04/88)
In article <696@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes: > If you are (sigh) going to post binaries on Usenet, DO NOT compress > them first. Many Usenet sites use compress to pack up their news > batches. Compressing a compressed file makes it larger. But you are would not be compressing the compressed file, you would be compressing an encoded file. Here are the results of some experiments on a 100K UNIX binary: $ uuencode | compress -rw-r--r-- 1 egisin 83111 May 3 16:25 uu.Z $ compress | uuencode | compress -rw-r--r-- 1 egisin 81241 May 3 16:30 uuz.Z Compressing before encoding results in a 2% shorter file, but that is not really significant. You can get better results by using a simple hex encoding: $ compress | hexencode | compress -rw-r--r-- 1 egisin 78831 May 3 16:31 hdz.Z None of this applies to source files, they should never be compressed and encoded.
jeff@cullsj.UUCP (Jeffrey C. Fried) (05/04/88)
1) COMPRESS is a text only compression routine. It will not now, or ever, help in the compression of binary files. 2) ARITH is a more general compression routine using adaptive arithmetic coding. It will compress binary files where there is redundancy, but when it fails (on an extremely random file) the result increases very little (under 1% in my experience). It compresses better than HUFFMAN, but it is NOT faster than SQ/UNSQ which are written in assembler whereas ARITH is written in C. (Once again, i will post it if there is sufficient interest.) 3) The source for ZOO, PKARC, and the others is NOT available. Therefore we are at the whims of whomever is currently supporting (or not supporting) them. 4) COMPRESS works faster and better on text files then the ARC routines because they use 12 bit compression, where 13-bit (and more) are possible under even the PC for COMPRESS (i've tried it on ans AT-clone). 5) On the weak side, there is as yet, no CRC or checksum for any of these, but adding it would be someithing i am willing to take responsibility for should enough people decide they would like to take the approach which i'm currently suggesting. Also, there no directory support provided with these tools. They work on only one file at a time. This is also correctable since the source is available. 5) LASTLY: I am not trying to criticize the ARC routines, rather i am trying to offer an alternative which i feel will reduce the time for transmission of files, as well as, providing us with portability. COMPRESS, ARITH, UNSHAR and UUENCODE are all available at the source level. COMPRESS and ARITH have been tried in at least three different environments: UNIX (BSD), VMS and PC/MS-DOS. Remember, for those of us who are NOT using the NET at the expense of a university, the cost of communication, and therefore the time required to transmit a file, are VERY important. If this sounds like a flame, then please assign my apparent bad attitude to poor methodology rather than a desire to upset people. This is provided in the spirit of adding to what i hope will become a meaningful dialog with a very practicle result.
mike@ists (Mike Clarkson) (05/04/88)
In article <696@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes: > If you are (sigh) going to post binaries on Usenet, DO NOT compress > them first. Many Usenet sites use compress to pack up their news > batches. Compressing a compressed file makes it larger. How about compressing a uuencoded compressed file. Does that result in a significantly larger than original file? I would really like to see a uniform standard, with error checking, and I think it is something worth the time it takes to do it. We could probably evolve the result to take care of another pet peeve of mine: error correction in the tar format. One thing I really miss from VMS is the backup tape archiver, which has tremendous error checking and correction. In 7 years I have only ever had (touch wood) 1 tape go on me, and that was because the oxide was falling off. Having spent a good part of today dealing with yet another dead Unix tar tape, I really wish we could find a better way. -- Mike Clarkson mike@ists.UUCP Institute for Space and Terrestrial Science mike@ists.yorku.ca York University, North York, Ontario, CANADA M3J 1P3 (416) 736-5611
chasm@killer.UUCP (Charles Marslett) (05/04/88)
In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes: > Just one thing that needs to be known -- PC's can do no more than 12-bit > compression. ... Actually, I have sent several people copies of a minor mod to compress 4.0 that works fine if you have the memory (requires about 350-400 K above DOS to do 16-bit compression). The source assumes Turbo or Microsoft C for the PC but it doesn't take up an immense amount of disk space either (about 40K if I remember correctly). I have also ported it to Atari STs, so that covers some of the PC field. Anyone want to merge these changes into the more recent (4.1?) posting and perhaps make it work on Macs and Amigas? Any good rule of thumb on how many requests imply a posting choice? > Bill Charles Marslett chasm@killer.UUCP ...!ihnp4!killer!chasm
jcs@tarkus.UUCP (John C. Sucilla) (05/04/88)
In article <55@psuhcx.psu.edu> wcf@psuhcx (Bill Fenner) writes: >Just one thing that needs to be known -- PC's can do no more than 12-bit >compression. So if you are compressing your file from a UNIX system, >you need to say comress -b12 filename . Wrong! My 640K AT&T PC6300 has compress v4.0 running 16 bits on it right now. -V shows the options at: MSDOS, XENIX_16 and BITS=16. -- John "C" Sucilla {ihnp4}!tarkus!jcs Don't let reality stop you....
msf@amelia.nas.nasa.gov (Michael S. Fischbein) (05/04/88)
In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: > 1) COMPRESS is a text only compression routine. It will not now, or ever, > help in the compression of binary files. This is absolutely untrue. Compress works fine on binary files; I have seen 200 to 1 reductions on some bitmaps. Tarfiles compress readily, etc. > 4) COMPRESS works faster and better on text files then the ARC routines > because they use 12 bit compression, where 13-bit (and more) are possible > under even the PC for COMPRESS (i've tried it on ans AT-clone). 16-bit compress is possible on IBM-PC's; it reportedly even runs under MSDOS. Mention was made later in the original to uuencode; the atob and btoa programs are more easier to use and are also freely available in source form. mike -- Michael Fischbein msf@ames-nas.arpa ...!seismo!decuac!csmunix!icase!msf These are my opinions and not necessarily official views of any organization.
karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) (05/04/88)
1) COMPRESS is a text only compression routine. It will not now, or ever, help in the compression of binary files. Nonsense. [58] [8:33am] tut:/dino0/karl/bin/pyr/private> list enable 34831 -rwsr-x--- 2 root staff 8192 Apr 13 09:54 enable [59] [8:33am] tut:/dino0/karl/bin/pyr/private> file enable enable: 90x family demand paged pure executable [60] [8:33am] tut:/dino0/karl/bin/pyr/private> compress -v < enable > enable.Z Compression: 72.44% [61] [8:33am] tut:/dino0/karl/bin/pyr/private> list enable.Z 35427 -rw-r--r-- 1 karl staff 2257 May 4 08:34 enable.Z [62] [8:33am] tut:/dino0/karl/bin/pyr/private> --Karl
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/04/88)
In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: | | 1) COMPRESS is a text only compression routine. It will not now, or ever, | help in the compression of binary files. [ compress gives about 30% compression on binaries, depending on content. Whoever told you that it was for text only was completely wrong. ] | | 2) ARITH is a more general compression routine using adaptive arithmetic | coding. It will compress binary files where there is redundancy, but | when it fails (on an extremely random file) the result increases very | little (under 1% in my experience). It compresses better than HUFFMAN, | but it is NOT faster than SQ/UNSQ which are written in assembler whereas | ARITH is written in C. | (Once again, i will post it if there is sufficient interest.) [ once again, do it, in source, so that others can test it themselves rather than relying on your opinion. ] | | 3) The source for ZOO, PKARC, and the others is NOT available. Therefore | we are at the whims of whomever is currently supporting (or not supporting) | them. [ the sources for zoo and arc have been posted several times to the net, and are available on a number of sites via ftp, uucp, and simple BBS download. ] | 5) On the weak side, there is as yet, no CRC or checksum for any of these, | but adding it would be someithing i am willing to take responsibility | for should enough people decide they would like to take the approach | which i'm currently suggesting. [ zoo and arc both have CRC. ] | Also, there no directory support provided with these tools. They work | on only one file at a time. This is also correctable since the source | is available. [ arc works on multiple files in multiple directories, but doesn't preserve subdirectory information. zoo preserves the information unless told not to do it (an option). ] | | 5) LASTLY: I am not trying to criticize the ARC routines, rather i am trying | [ deleted for brevity ] | Remember, for those of us who are NOT using the NET at the expense of a | university, the cost of communication, and therefore the time required | to transmit a file, are VERY important. [ everyone would like faster transmissions, but not at the expense of using a non-standard format which people can't use. Sending info which is not useful is a *real* waste of bandwidth. ] | | If this sounds like a flame, then please assign my apparent bad attitude to | poor methodology rather than a desire to upset people. This is provided in the | spirit of adding to what i hope will become a meaningful dialog with a very | practicle result. The most charitable assumption I can make is that you are woefully misinformed about the matters on which you speak. Please post this "ARITH" routine to let others evaluate it, and read the responses to your posting, many of which will probably not be even a polite as this one. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/04/88)
I would like to add a little fuel to the fires of "which archiver" discussion. Use of the 'btoa' routine instead of uuencode would save 12% (!) on binary postings. This is a PD program, included in the compress package, and runs just fine on a PC. All the discussion of using PKARC to save 1-2% or not using it to save time for many of the people on the net seems pointless. We should use both (standard) arc and zoo formats, uuencode them, and save bandwidth by dropping this discussion. Hopefully Rahul will clarify this by edict. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
jpn@teddy.UUCP (John P. Nelson) (05/04/88)
> 1) COMPRESS is a text only compression routine. It will not now, or ever, > help in the compression of binary files. Whoa! Where did THIS come from!?!? It is simply not true! It IS true that compress does a better job at compressing text files, but this is because there is usually more redundency in text files than most binary files (like executables). Compress is simply MARVELOUS for binary files like bit-mapped graphics, getting something like 90% compression for many of them. > 2) ARITH is a more general compression routine using adaptive arithmetic > coding. It will compress binary files where there is redundancy, but > when it fails (on an extremely random file) the result increases very > little (under 1% in my experience). It compresses better than HUFFMAN, > but it is NOT faster than SQ/UNSQ which are written in assembler whereas > ARITH is written in C. > (Once again, i will post it if there is sufficient interest.) Now we get some facts. ARITH is HUFFMAN encoding. Compress is Lempel-Ziv encoding. Lempel-Ziv almost ALWAYS beats HUFFMAN (when there is a redundancy). It is certainly possible that Lempel-ziv might expand random files more than HUFFMAN, I haven't done any tests. Older versions of ARC used to try both HUFFMAN and Lempel-Ziv, and use the one that gave better compression. The HUFFMAN support was dropped (except for extracting from old archives), because Lempel-Ziv beat HUFFMAN 99% of the time! > 3) The source for ZOO, PKARC, and the others is NOT available. Therefore > we are at the whims of whomever is currently supporting (or not supporting) > them. MORE untruths. The source for both ZOO and ARC are in C, and have been distributed on USENET several times! Some versions of the ARC source included the extra code to handle the SQUASH compression algorithm added by PKARC. > 4) COMPRESS works faster and better on text files then the ARC routines > because they use 12 bit compression, where 13-bit (and more) are possible > under even the PC for COMPRESS (i've tried it on ans AT-clone). PKARC's SQUASH is 13 bit compression. Any more than this requires a working buffer larger than 64K, which is why they are generally not used very much on PCs. The amount of additional compression between 13 bit and 16 bit is no more than 2 or 3 percent! Also, there is very little difference in speed between the 12 bit and 13 bit compression algorithms. The major difference is in the memory requirements. > 5) On the weak side, there is as yet, no CRC or checksum for any of these, > but adding it would be someithing i am willing to take responsibility > for should enough people decide they would like to take the approach > which i'm currently suggesting. This is the LEAST of the problems with using compress. > Also, there no directory support provided with these tools. They work > on only one file at a time. This is also correctable since the source > is available. True, but why reinvent the wheel. The source for the EXISTING programs is ALSO available! > If this sounds like a flame, then please assign my apparent bad attitude to >poor methodology rather than a desire to upset people. This is provided in the >spirit of adding to what i hope will become a meaningful dialog with a very >practicle result. Your bad attitude appears to be due to an overdose of misinformation! -- john nelson UUCP: {decvax,mit-eddie}!genrad!teddy!jpn ARPA (sort of): talcott.harvard.edu!panda!teddy!jpn
dhesi@bsu-cs.UUCP (Rahul Dhesi) (05/04/88)
In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: > 3) The source for ZOO, PKARC, and the others is NOT available. The source for zoo 1.51 was posted to comp.sources.unix in the summer of 1987. The source for zoo 2.01 will be posted in the near future. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
bobmon@iuvax.cs.indiana.edu (RAMontante) (05/04/88)
cullsj.UUCP (Jeffrey C. Fried) writes, among other things: , , 1) COMPRESS is a text only compression routine. It will not now, or ever, , help in the compression of binary files. This statement made me shell out and run the following quick experiment: -rwxr-xr-x 1 bobmon 15360 Feb 27 01:22 pgen -rwxr-xr-x 1 bobmon 10116 May 4 08:46 pgen.Z -rwxr-xr-x 1 bobmon 14336 Feb 24 08:19 pom -rwxr-xr-x 1 bobmon 9945 May 4 08:47 pom.Z Pgen and pom are both executable files (compiled from 'c'). Granted, this is on a VAX machine, running the full-blown compress. My attempts to run compress on my 8088 box were frustrating, given its memory requirements, and I haven't seen enough '.Z' formatted files to be worth the hassle. But I would assume that if it runs at all on a smaller machine, it will produce the same results; unlike zoo and arc, it cannot choose one compression method over another. , 3) The source for ZOO, PKARC, and the others is NOT available. Therefore , we are at the whims of whomever is currently supporting (or not supporting) , them. Source for arc is, at least for some Unix boxes. Zoo source has been promised. Pkarc was originally written in 8088 assembler, not the friendliest source. , 4) COMPRESS works faster and better on text files then the ARC routines , because they use 12 bit compression, where 13-bit (and more) are possible , under even the PC for COMPRESS (i've tried it on ans AT-clone). I haven't seen source for compress, either. And the executables I've seen were enormous, and limited to 12-bit LZW on 8088's under MSDOS; just like zoo and arc (and pkarc's squash method is some sort of 13-bit LZW). I've never heard anyone claim responsibility for compress, while the authors of zoo, pkarc, and arc are named, revered, vilified, and flamed frequently. At least one of them is an active participant on the Usenet. (Plug: I think that's one strength of zoo, although Rahul might disagree :-) , 5) On the weak side, there is as yet, no CRC or checksum for any of these, Any of WHAT? Zoo and arc certainly have a CRC value. Compress is compress. Its Unix-origin philosophy says that separate functions should be done by separate routines with their outputs tied together by the operating system. I think this is at the heart of some of the debates here. The philosophy works fine on a big multitasking machine like a VAX (or a suitably equipped 680x0 or '386?), and the entire news mailer system is predicated on that principle -- the mailer just calls compress (EVERYbody has compress, right?) to pack things in for it; it doesn't worry about whether the result is correct, and neither does compress. It's up to you to aggregate your files with shar or something. This piece-at-a-time philosophy is weaker on something like my MSDOS 8088 box. There aren't multiple users all needing similar fundamental tools, there's just me. And I haven't the resources (memory or CPU cycles) to support lots of little pieces that work fine individually but need sophisticated glue to work together; MSDOS's simulation of pipes is pathetic. In such a situation an integrated package (viz., zoo or arc) makes a lot more sense. They can incorporate in a consistent manner all those little pieces that a system admin. may have put on a Unix box, but which I haven't yet found while rummaging around BBS's. By integrating everything a top-down design is possible, unlike what happens when you bend the problem to fit the tools you already have. , but adding it would be someithing i am willing to take responsibility , for should enough people decide they would like to take the approach , which i'm currently suggesting. At which point it will become yet another uncommon non-standard (like ARITH?). I don't think adding code will make it fit any better on small machines, and the big machines can afford to calculate a CRC with an external routine. Not to mention the question of what you DO with it... Is the CRC for compress's use? Then it becomes not-quite-compress. Is it for human use? Then how do I recreate it to find out if the file is still intact? ... , 5) LASTLY: I am not trying to criticize the ARC routines, rather i am trying , to offer an alternative which i feel will reduce the time for transmission , of files, as well as, providing us with portability. COMPRESS, ARITH, , UNSHAR and UUENCODE are all available at the source level. COMPRESS and , ARITH have been tried in at least three different environments: UNIX (BSD), , VMS and PC/MS-DOS. , Remember, for those of us who are NOT using the NET at the expense of a , university, the cost of communication, and therefore the time required , to transmit a file, are VERY important. I don't find 1200bps transmission to be a lot of fun to wait for, either... but I take it that your basic argument is that compress makes smaller archives than zoo or arc, which are therefore cheaper to transmit. I don't see that the compression improvement is as significant as you imply (and your statement about binary is completely at odds with all my experience). The other strengths of the integrated packages offer a LOT of functionality, some of which I would seek out even if there were no compression involved. The biggest problem I see is that many news mailers compress everything blindly, so that an already-compressed file gets bigger. This would also be true of a sufficiently random file, although I think most executables aren't that random. And this compress-and-be-damned behavior is not a strength of the system, it's a weakness. (Even compress will complain if its result is bigger than its original; does the mailer ignore this, or are the net.gods lying when they claim they're shipping bigger files because of the double compression?)
ralf@b.gp.cs.cmu.edu (Ralf Brown) (05/04/88)
In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: } 3) The source for ZOO, PKARC, and the others is NOT available. Therefore } we are at the whims of whomever is currently supporting (or not supporting) } them. Sources are available for ZOO and ARC. -- {harvard,uunet,ucbvax}!b.gp.cs.cmu.edu!ralf -=-=- AT&T: (412)268-3053 (school) ARPA: RALF@B.GP.CS.CMU.EDU |"Tolerance means excusing the mistakes others make. FIDO: Ralf Brown at 129/31 | Tact means not noticing them." --Arthur Schnitzler BITnet: RALF%B.GP.CS.CMU.EDU@CMUCCVMA -=-=- DISCLAIMER? I claimed something?
wtr@moss.ATT.COM (05/04/88)
In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: > >1) COMPRESS is a text only compression routine. It will not now, or ever, > help in the compression of binary files. > [I'm not sure if this will be construed as a flame, but, asbestos suit in hand, here goes!] WHAT ARE YOU TALKING ABOUT!?!?!?!? I have assumed that everyone has been talking about the program COMPRESS v4.0 that was posted to comp.sources.???? late last year (let's not get too picky about the dates ;-). It was based upon a "modified Lempel-Ziv algorithm" as published in IEEE Computer by Terry A. Welch. PD source was (at least in part) written by Joe Orost. (appologies to anyone unintentionally left out of the credits) With the full sixteen-bit compression, it does a great job of compressing (almost ;-) all files, binary and source. Most compression ratios are in the 50-60% range, occasionally as high as 75%. (larger files seem to compress a little better) I have no idea what program you are referring to when you are describing your 'compress' but it is certainly not the same program that I run on my AT clone at home. ===================================================================== Bill Rankin Bell Labs, Whippany NJ (201) 386-4154 (cornet 232) email address: ...![ ihnp4 ulysses cbosgd allegra ]!moss!wtr ...![ ihnp4 cbosgd akgua watmath ]!clyde!wtr =====================================================================
jeff@cullsj.UUCP (Jeffrey C. Fried) (05/05/88)
I stand corrected. Since Lem-Ziv was DESIGNED for text compression, and the authors do not mention its use for binaries, i never considered using it. I tried it on an executable under UNIX and obtained a good reduction, for reasons which are not apparent. I'm sure that there are cases where this does not work (like graphics files), but it does appear to work , and in this case better than the current version of ARITH. However, my point was that for TEXT, COMPRESS does a better job then the ARC programs with which i'm familiar. Also, i did not know that source for zoo was available - a consideration which i believe to be VERY important since support usually comes best from those who use a product. I would like to thank those who took the time to correct my missunderstanding concerning the use of compression on the net, but i find it just a bit difficult because of the tone used in communicating with me. For those who suggested that i "do my homework" before posting something to the net, i can only say that since the net is my ONLY contact with this problem, and that the comp...d group is for discussions, i am in essence "doing my homework". I'm sorry if my attempt to add to the discussion has caused anyone to feel that their precious time has been wasted, but i think that you're as wrong as you are rude. Humbly yours, Jeff Fried ...!ames!cullsj!jeff Cullinet Software 2860 Zanker Road, Suite 206 Reality, what a concept! San Jose, CA, 95134
cudcv@daisy.warwick.ac.uk (Rob McMahon) (05/05/88)
In article <292@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: >The following tools are >available in source code format: COMPRESS (Lem-Ziv text compressor), ARITH >(arithmetic compression for binary), UUencode/decode. Since all of these >will run under a variety of environments (IBM-PC, AMIGA, ATARI, VMS, SYS5, >BSD), why not make these the basis for communicating. I hope we're talking about binary files here, in which case I don't care because I'd never just take a binary from the net and run it on one of my machines. If you're talking about sources, I like to scan down, read the README, check out the comments in main etc., before I even save it to disk. If I get all the bits of a posting, tack them together, uudecode them, and uncompress them, only to find it's of no use to me, I'm not going to be amused. I have this feeling that people aren't going to bother to send proper introductory articles in plain text before the actual posting. Rob -- UUCP: ...!mcvax!ukc!warwick!cudcv PHONE: +44 203 523037 JANET: cudcv@uk.ac.warwick.cu ARPA: cudcv@cu.warwick.ac.uk Rob McMahon, Computing Services, Warwick University, Coventry CV4 7AL, England
tif@cpe.UUCP (05/05/88)
Written 10:45 am May 2, 1988 by bbn.com!rsalz in cpe:comp.sources.d >If you are (sigh) going to post binaries on Usenet, DO NOT compress >them first. Many Usenet sites use compress to pack up their news >batches. Compressing a compressed file makes it larger. << I don't post often and this about as close to a flame as I've come >> But what you're compressing is text. Text can always be compressed with a significant advantage even if the text is a uuencoded compress file. In the end, the uuencode offsets most of the gains of the extra compress. If I were implementing a compressing file transfer utility which had the possibity of transfering *binary* files, I would make sure that the compress was actually profitable. Since now adays most "news" transfers are compressed, I'll give in that *if* "news" could transfer binary files, the compressing should be left to the news transfer stuff rather than be done by the poster. Since "news" can't handle binary files (at least nobody assumes that it can), the file has to be encoded in some way. I'll use my kernel as a sample binary input file and uuencode for the encoding technique. I've included the results for my /usr/dict/words file as well since postings sometimes intermix binary and ASCII files. (I couldn't figure out what order these should be in) (12 bit compresses only) Net change of transfer size transfer posted binary ASCII ------------------------------------------------------------------- uncompressed normal * no change no change compressed normal * -35% -46% uncompressed uuencoded +40% +40% uncompressed uuencoded compress -9% -25% compressed uuencoded -15% -23% compressed uuencoded compress -19% -29% (if you believe in 16 bit compresses) Net change of transfer size transfer posted binary ASCII ------------------------------------------------------------------- compressed normal * -43% -49% uncompressed uuencoded compress -20% -28% compressed uuencoded -28% -30% compressed uuencoded compress -31% -35% * These can't be posted but are provided for reference CONCLUSIONS: To transfer binary files using news software the best method in all cases is to post a uuencoded compress file. When transfering ASCII files, if you compress and uuencode, not only are you robbing a 15-20% savings from the sites that use compressed transfers, but you should be shot for making it unreadable. For the skeptics, here are the file sizes I used to build the tables: 16 bit compress 12 bit compress 234157 234157 xenix 133183 152458 xenix.Z 327860 327860 xenix.u 167759 198043 xenix.u.Z 186498 213482 xenix.Z.u 161860 190137 xenix.Z.u.Z 194192 194192 words 99725 104093 words.Z 271908 271908 words.u 135065 150047 words.u.Z 139656 145772 words.Z.u 126459 137701 words.Z.u.Z Paul Chamberlain Computer Product Engineering, Tandy Corp. ihnp4!sys1!cpe!tif
geoff@utstat.uucp (Geoff Collyer) (05/05/88)
In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: > 1) COMPRESS is a text only compression routine. It will not now, or ever, > help in the compression of binary files. This is absolutely dead wrong. compress compresses any kind of file, and has been used to compress (and correctly uncompress!), for example, graphics bit maps, sendmail configuration files :-), and tar archives containing binaries. -- Geoff Collyer utzoo!utstat!geoff, utstat.toronto.{edu,cdn}!geoff
jbuck@epimass.EPI.COM (Joe Buck) (05/05/88)
In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: > > 1) COMPRESS is a text only compression routine. It will not now, or ever, > help in the compression of binary files. Most emphatically wrong. compress works just fine on many types of binary files. It can give 90% or more compression on bitmap data, and usually > 50% compression on Unix executable files. About the only type of file I know of that compress fails on consistently is floating point data in binary format. As long some strings of bytes occur much more frequently than others (whether they represent characters, opcodes, or grey levels) compress kicks ass. -- - Joe Buck {uunet,ucbvax,sun,<smart-site>}!epimass.epi.com!jbuck Old Internet mailers: jbuck%epimass.epi.com@uunet.uu.net Argue for your limitations and you get to keep them. -- Richard Bach
campbell@maynard.BSW.COM (Larry Campbell) (05/05/88)
In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
<>>Just one thing that needs to be known -- PC's can do no more than 12-bit
<>>compression. So if you are compressing your file from a UNIX system,
<>>you need to say comress -b12 filename .
<>
<>This myth has been repeated several times, so I felt it was necessary to
<>speak up. PCs most certainly CAN do a 16 bit compress/uncompress. ...
Only a subset of PCs can do 16-bit compress/uncompress. Mine can't.
I'm running VENIX/86 2.0, which is basically V7; the PCC-derived
C compiler has only the tiny and small memory models (exactly
corresponding to non-split and split PDP-11s, which also cannot
handle 16-bit compress).
So it is true that PCs with a C compiler that supports multiple data
segments can handle 16-bit compress, but that hardly encompasses all
PCs in the world.
--
Larry Campbell The Boston Software Works, Inc.
Internet: campbell@maynard.bsw.com 120 Fulton Street, Boston MA 02109
uucp: {husc6,mirror,think}!maynard!campbell +1 617 367 6846
greg@vertical.oz (Greg Bond) (05/05/88)
In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: > 1) COMPRESS is a text only compression routine. It will not now, or ever, > help in the compression of binary files. No doubt others will point this out, but: using compress V4.0, nethack 2.3 binary on a sun: Script started on Thu May 5 11:20:17 1988 vertical% ls -l nethack -rwxr-xr-x 1 greg 761856 May 5 11:20 nethack* vertical% compress -v nethack nethack: Compression: 48.25% -- replaced with nethack.Z vertical% ls -l nethack.Z -rwxr-xr-x 1 greg 395121 May 5 11:20 nethack.Z* vertical% exit vertical% script done on Thu May 5 11:21:37 1988 I would call that working. (Compress works on arbitary streams of 8-bit bytes. It would be possible to write a version that only compressed 7-bit text, and perhaps got better compression (say, 5%), but that is NOT the version in the public domain). Greg. -- Gregory Bond, Vertical Software, Melbourne, Australia Internet: greg@vertical.oz.au (or greg%vertical.oz.au@uunet.uu.net) UUCP: {uunet,pyramid,mnetor,ukc,ucb-vision}!munnari!vertical.oz!greg ACSnet: greg@vertical.oz
loci@csccat.UUCP (Chuck Brunow) (05/05/88)
Let me point out one simple fact: source code is VERY MUCH SMALLER than binaries.
les@chinet.UUCP (Leslie Mikesell) (05/05/88)
In article <25816@clyde.ATT.COM> wtr@moss.UUCP (Bill Rankin) writes: > >Personally, I use cpio & compress to move files. I don't care >about execution time, rather transmission time is my most important I like this also, but if an entire cpio archive is compressed, it is impossible to (a) list the directory without a decompression pass or (b) recover any part beyond a bit error in transmission. Has anyone condsidered a program which would leave the cpio headers uncompressed but store the data as though each file had been individually compressed (including adding the .Z to the name so extraction would be possible with a normal cpio followed by uncompress)? This would be a nice thing to use for normal backups, especially if it followed the normal compress rules of not trying to compress something that already had the .Z extension. That still leaves the problem of compress needing 2 extra characters in the filename and DOS needing some other name convention entirely... Les Mikesell
paul@devon.UUCP (Paul Sutcliffe Jr.) (05/05/88)
In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: +--------- | 1) COMPRESS is a text only compression routine. It will not now, or ever, | help in the compression of binary files. +--------- This is absolute and complete Bull-Ka-Ka. # cp /bin/sh /tmp # cd /tmp # ls -l sh -rwx--x--t 1 root root 37762 May 5 09:23 sh # compress -V -v sh $Header: compress.c,v 4.0 85/07/30 12:50:00 joe Release $ Options: BITS = 16 sh: Compression: 34.90% -- replaced with sh.Z # ls -l sh.Z -rwx--x--t 1 root root 24582 May 5 09:23 sh.Z Looks like you can compress binaries to me! Granted, the compression factor isn't as good as can be had with text files (I've seen as much as 90% in text files with plenty of repeating characters), but it *does* work on binaries. - paul -- Paul Sutcliffe, Jr. +------------------------+ | Know what I hate most? | UUCP (smart): paul@devon.UUCP | Rhetorical questions. | UUCP (dumb): ...rutgers!bpa!vu-vlsi!devon!paul +------<Henry Camp>------+
feg@clyde.ATT.COM (Forrest Gehrke) (05/05/88)
In article <10712@steinmetz.ge.com>, davidsen@steinmetz.ge.com (William E. Davidsen Jr) writes: > > I would like to add a little fuel to the fires of "which archiver" > discussion. Use of the 'btoa' routine instead of uuencode would save > 12% (!) on binary postings. This is a PD program, included in the > compress package, and runs just fine on a PC. Having tried this sometime back, I have often wondered why this approach is not used by USENET. It would save a lot of transmission time. > All the discussion of using PKARC to save 1-2% or not using it to save > time for many of the people on the net seems pointless. We should use > both (standard) arc and zoo formats, uuencode them, and save bandwidth > by dropping this discussion. Hopefully Rahul will clarify this by edict. Also an excellent suggestion. We could quickly find out from experience which archiver works out best through use. BTW what is holding up Rahul from taking over as moderator? Forrest Gehrke k2bt
jbuck@epimass.EPI.COM (Joe Buck) (05/05/88)
In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: > > I stand corrected. Since Lem-Ziv was DESIGNED for text compression, and >the authors do not mention its use for binaries, i never considered using it. >I tried it on an executable under UNIX and obtained a good reduction, for >reasons which are not apparent. I'm sure that there are cases where this does >not work (like graphics files), but it does appear to work , and in this case >better than the current version of ARITH. Jeff, Jeff, Jeff. You're STILL putting your foot in your mouth. :-) A Unix file is just a stream of bytes, and so is an MS-DOS file except that it has extra attributes as well. Compress replaces byte strings with codes whose lengths are between 9 and 16 bits. It will work well on any file in which some byte sequences are more common than others. An executable file consists of instructions, which, for almost all processors are integral numbers of bytes, and some are much more common than others. So compress works fine, and will give good compression for just about any executable file. There are several types of graphics files: bitmaps are HIGHLY compressible; other types of files act like a program for an imaginary computer and consist of byte codes, some much more common than others. These compress well also. There are only three types of files I've ever given to compress that haven't been reduced in size as a result: random binary data, floating point binary data, and files that have already been compressed. -- - Joe Buck {uunet,ucbvax,sun,<smart-site>}!epimass.epi.com!jbuck Old Internet mailers: jbuck%epimass.epi.com@uunet.uu.net Argue for your limitations and you get to keep them. -- Richard Bach
tneff@dasys1.UUCP (Tom Neff) (05/06/88)
In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: > ... Also, i did not know that source for >zoo was available - a consideration which i believe to be VERY important >since support usually comes best from those who use a product. The source for ARC is available too, and it's running on (for instance) this Stride. Don't confuse the ARC standard with Phil Katz's PC-optimized clone PKARC. Due to an assiduous sales job most PC sysops have the Katz thing, but it ain't the original. The "C" language real McCoy is slower on PC's but more portable. >For those who suggested that i "do my homework" before posting something >to the net, i can only say that since the net is my ONLY contact with this >problem, and that the comp...d group is for discussions, i am in essence >"doing my homework". There is a school of thought, notably expressed in the cat.announce.newusers material, than the Net is a place for authoritative answers and requests for same, not for "homework" owing to the expense of carrying it all. I try to keep an open mind. :-) Not that your posting was anything to apologize for anyway... -- Tom Neff UUCP: ...!cmcl2!phri!dasys1!tneff "None of your toys CIS: 76556,2536 MCI: TNEFF will function..." GEnie: TOMNEFF BIX: are you kidding?
brianc@cognos.uucp (Brian Campbell) (05/06/88)
In article <696@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes: > If you are (sigh) going to post binaries on Usenet, DO NOT compress > them first. Many Usenet sites use compress to pack up their news > batches. Compressing a compressed file makes it larger. Maybe those Usenet sites should not use the -f (force) flag with compress. Every version I've used (Sun, XENIX and DOS) will not replace the original if the compressed version would be larger. Try compressing a file twice using the -v (verbose) option and see what happens. -- Brian Campbell uucp: decvax!utzoo!dciem!nrcaer!cognos!brianc Cognos Incorporated mail: POB 9707, 3755 Riverside Drive, Ottawa, K1G 3Z4 (613) 738-1440 fido: (613) 731-2945 300/1200, sysop@1:163/8
laba-5ac@web7f.berkeley.edu (Erik Talvola) (05/06/88)
In article <1082@maynard.BSW.COM> campbell@maynard.UUCP (Larry Campbell) writes: <>In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes: <><>>Just one thing that needs to be known -- PC's can do no more than 12-bit <><>>compression. So if you are compressing your file from a UNIX system, <><>>you need to say comress -b12 filename . <><> <><>This myth has been repeated several times, so I felt it was necessary to <><>speak up. PCs most certainly CAN do a 16 bit compress/uncompress. ... <> <>Only a subset of PCs can do 16-bit compress/uncompress. Mine can't. <>I'm running VENIX/86 2.0, which is basically V7; the PCC-derived <>C compiler has only the tiny and small memory models (exactly <>corresponding to non-split and split PDP-11s, which also cannot <>handle 16-bit compress). <> <>So it is true that PCs with a C compiler that supports multiple data <>segments can handle 16-bit compress, but that hardly encompasses all <>PCs in the world. <>-- What's wrong with getting a 16-bit Compress executable file for the PC which was compiled with a proper C compiler? Then, you can run a 16-bit compress on any PC. You are right in that you may not be able to compile it with all C compilers, but you can run the executable on any PC (as long as you have ~500K free). >Larry Campbell The Boston Software Works, Inc. >Internet: campbell@maynard.bsw.com 120 Fulton Street, Boston MA 02109 >uucp: {husc6,mirror,think}!maynard!campbell +1 617 367 6846 --------------------------------------------------- Erik Talvola erikt@zen.berkeley.edu "...death is an acquired trait." -- Woody Allen ---------------------------------------------------
caf@omen.UUCP (Chuck Forsberg WA7KGX) (05/06/88)
In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: : : 1) COMPRESS is a text only compression routine. It will not now, or ever, : help in the compression of binary files. The 13 bit compression in zoo gets about 29% compresseing YAM.EXE. : 2) ARITH is a more general compression routine using adaptive arithmetic : coding. It will compress binary files where there is redundancy, but Please post it! : 3) The source for ZOO, PKARC, and the others is NOT available. Therefore : we are at the whims of whomever is currently supporting (or not supporting) : them. The sources to ZOO *are* available, in fact it was a copy of ZOO I compiled for 386 Xenix that I used in the above micro-benchmark. : 4) COMPRESS works faster and better on text files then the ARC routines : because they use 12 bit compression, where 13-bit (and more) are possible : under even the PC for COMPRESS (i've tried it on ans AT-clone). Compress, ARC, PKARC, and ZOO all use forms of LZW compression, derived from the original Unix compress program. : 5) On the weak side, there is as yet, no CRC or checksum for any of these, : but adding it would be someithing i am willing to take responsibility : for should enough people decide they would like to take the approach : which i'm currently suggesting. The lack of a CRC in compress is a serious weakness. ZRC and ZOO include CRC. : Also, there no directory support provided with these tools. They work : on only one file at a time. This is also correctable since the source : is available. ZOO has excellent directory support - full Unix pathnames are supported. Again, please post the ARITH program. It would be most interesting if the memory requirements are small - like Huffman encoding instead of LZW. Newsgroups: comp.sources.d,comp.binaries.ibm.pc.d Subject: Re: Standard for file transmission Summary: Expires: References: <292@cullsj.UUCP> <55@psuhcx.psu.edu> <537@csccat.UUCP> <I> <would> <like> <to> <clear> <up> <a> <couple> <of> <notions> <that> <have> <been> <expressed> <over> <296@cullsj.UUCP> Sender: Reply-To: caf@omen.UUCP (Chuck Forsberg WA7KGX) Followup-To: Distribution: Organization: Omen Technology Inc, Portland Oregon Keywords: protocol compression source In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: : : 1) COMPRESS is a text only compression routine. It will not now, or ever, : help in the compression of binary files. The 13 bit compression in zoo gets about 29% compresseing YAM.EXE. : 2) ARITH is a more general compression routine using adaptive arithmetic : coding. It will compress binary files where there is redundancy, but Please post it! : 3) The source for ZOO, PKARC, and the others is NOT available. Therefore : we are at the whims of whomever is currently supporting (or not supporting) : them. The sources to ZOO *are* available, in fact it was a copy of ZOO I compiled for 386 Xenix that I used in the above micro-benchmark. : 4) COMPRESS works faster and better on text files then the ARC routines : because they use 12 bit compression, where 13-bit (and more) are possible : under even the PC for COMPRESS (i've tried it on ans AT-clone). Compress, ARC, PKARC, and ZOO all use forms of LZW compression, derived from the original Unix compress program. : 5) On the weak side, there is as yet, no CRC or checksum for any of these, : but adding it would be someithing i am willing to take responsibility : for should enough people decide they would like to take the approach : which i'm currently suggesting. The lack of a CRC in compress is a serious weakness. ZRC and ZOO include CRC. : Also, there no directory support provided with these tools. They work : on only one file at a time. This is also correctable since the source : is available. ZOO has excellent directory support - full Unix pathnames are supported. Again, please post the ARITH program. It would be most interesting if the memory requirements are small - like Huffman encoding instead of LZW.
mark@adec23.UUCP (Mark Salyzyn) (05/06/88)
I'm sorry, I don't care if the IBM-PC can handle better than 12 bit compress! I run UNIX on a PDP 11/23 *NON SPLIT I/D MACHINE* and that allows me to use 12 bit compress (however I have a 13 bit LZW pack routine that was posted in 1983 that works fine). In order to read stuff that is packed more than 12 bit LZW I had to rewrite compress to use disk rather than memory. BOY IS IT SLOW. In the interest of compatibility with ALL types of machines I suggest that we use 12 bit compress. This is the most available compression bit selection. If not, then I am going to extend my disk version to handle 17 bit compress, post something usefull and watch you all squirm. G'day -- Mark Salyzyn, mad at the world for advancing and leaving me behind
ephram@violet.berkeley.edu (05/06/88)
In article <1082@maynard.BSW.COM> campbell@maynard.UUCP (Larry Campbell) writes: >In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes: ><>>Just one thing that needs to be known -- PC's can do no more than 12-bit ><>>compression. So if you are compressing your file from a UNIX system, ><>>you need to say comress -b12 filename . ><> ><>This myth has been repeated several times, so I felt it was necessary to ><>speak up. PCs most certainly CAN do a 16 bit compress/uncompress. ... > >Only a subset of PCs can do 16-bit compress/uncompress. Hasn't anyone ever heard of a disk drive?!? multiple segments as a limitation? How about writing temporary results to a disk file (random access)? RAM disk? Now I must admitt I have never cracked open the code to compress/uncompress, but it seems to me that using a disk drive as an intermediate result area is a very viable work around. I would rather sit and watch my disk spin for an extra minute than watch the RD light on my modem work 10% more time. I admit it is not elegant but when some one says "can not do" I must speak up. Ephram Cohen ephram@violet.berkeley.edu
gnu@hoptoad.uucp (John Gilmore) (05/06/88)
Has the first virus been transmitted by Usenet yet? Just think, 8100 readers of comp.binaries.ibm.pc will all be infected at once! I don't accept or transmit the binaries newsgroups (on hoptoad), so maybe I don't get to vote; but I think that rather than figuring out fancy ways to pass binaries around, we should just remove them from the Usenet. People who want binaries can start their own alternative network (bin.xxx?) and waste their own bandwidth and eyewidth. People who want to display their ignorance about compress should do so to a bartender somewhere, not to the net. -- John Gilmore {sun,pacbell,uunet,pyramid,ihnp4}!hoptoad!gnu gnu@toad.com "Use the Source, Luke...."
wtr@moss.ATT.COM (05/06/88)
In article <552@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes: > > Let me point out one simple fact: source code is VERY MUCH > SMALLER than binaries. Well, here we go again! 1290 -rw-r--r-- 1 xxx xxxxx 654848 Apr 24 20:29 ksh.cpio 225 -rwxr-xr-x 1 xxx xxxxx 113607 Apr 25 17:33 ksh* well, folks, it seems that the distribution source (uncompressed) is MUCH LARGER than the corresponding binary. [I'm going to make some crude, probamatically wrong in xx% of all cases, observations. Any attempt to find factual evidence below this line, and i'll sue for 'look and feel' ;-] It *SEEMS* (to me) that in smaller programs the binaries tend to be larger than the original source, i believe because of the overhead of the system code in proportion to the base source. On larger programs, this ratio is much lower, and thus the source tends to be larger than the corresponding executable. [ ---> insert flames here <--- ] [ and don't forget CYA! ] ===================================================================== Bill Rankin Bell Labs, Whippany NJ (201) 386-4154 (cornet 232) email address: ...![ ihnp4 ulysses cbosgd allegra ]!moss!wtr ...![ ihnp4 cbosgd akgua watmath ]!clyde!wtr =====================================================================
jejones@mcrware.UUCP (James Jones) (05/06/88)
In article <8430@iuvax.cs.indiana.edu>, bobmon@iuvax.cs.indiana.edu (RAMontante) writes: > [Compress's] Unix-origin philosophy says that separate functions should be > done by separate routines with their outputs tied together by the operating > system....The philosophy works fine on a big multitasking machine like a VAX > (or a suitably equipped 680x0 or '386?)... > This piece-at-a-time philosophy is weaker on something like my MSDOS 8088 box. > ...In such a situation an integrated package (viz., zoo or arc) makes a lot > more sense. Oddly enough, I would make just the opposite argument, though it is, like Mr. Montante's, influenced by the particulars of my home system, which has memory limitations too. A monolithic glob like ARC, which includes every compres- sion method known to man (well, arithmetic compression evidently hasn't made it in yet, and PKARC has LZ with a knob tweaked), won't fit on my 6809, which limits me to 64K (code + data) per process under OS-9/6809 Level Two. If those compression programs were written separately, they could individually fit and could be invoked by a process that knows the surrounding header bilge and calls up the appropriate compress/decompress program. (The software tools philosophy has advantages for programs as well as users.) James Jones
jpn@teddy.UUCP (John P. Nelson) (05/06/88)
> Let me point out one simple fact: source code is VERY MUCH > SMALLER than binaries. This is not clear. For small programs in a high-level compiled language (like C), this is true: This is because the small program pulls in the language run-time library. The source is much smaller than the resulting executable: However, I would bet that the object file (before linking) would be about the same size as the source (even WITH the symbol table and relocation information). Assembly language source usually run about 10 times larger than the resulting executable. Large C program (64k+) source usually runs two to three times larger than the resulting executable. Of course, I find source code more valuable: I can make changes to suit my environment, or I can port the program to a different machine entirely. And of course, with an operating system like UNIX which runs on a plethora of machines, source code is the only acceptable distribution mechanism. Other languages have different source/binary size ratios. Some languages can generate a lot of code with a very small amount of source. However, most of the source code posted to USENET is C. -- john nelson UUCP: {decvax,mit-eddie}!genrad!teddy!jpn ARPA (sort of): talcott.harvard.edu!panda!teddy!jpn
dhesi@bsu-cs.UUCP (Rahul Dhesi) (05/06/88)
In article <4521@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes: >I think that rather than figuring out fancy ways >to pass binaries around, we should just remove them from the Usenet. Look at the alternative: To be able to use sources on most microcomputers, you would probably have to have about five different C compilers, two or three assemblers, a Pascal compiler or two, and at least 10 megabytes of hard disk space for the big ones. Realize that no current microcomputer operating system on the market costing less than $300 comes bundled with a decent language translator. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
rroot@edm.UUCP (uucp) (05/07/88)
From article <3980@killer.UUCP>, by chasm@killer.UUCP (Charles Marslett): > In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes: >> Just one thing that needs to be known -- PC's can do no more than 12-bit >> compression. ... > Actually, I have sent several people copies of a minor mod to compress 4.0 > that works fine if you have the memory (requires about 350-400 K above DOS There are still, however, people running on systems who'se compilers don't know how to work with >64K. These systems exist and have to be dealt with. -- ------------- Stephen Samuel Disclaimer: You betcha! {ihnp4,ubc-vision,seismo!mnetor,vax135}!alberta!edm!steve BITNET: USERZXCV@UQV-MTS
loci@csccat.UUCP (Chuck Brunow) (05/07/88)
In article <2096@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes: >In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: >> >> I stand corrected. Since Lem-Ziv was DESIGNED for text compression, and >>the authors do not mention its use for binaries, i never considered using it. >>I tried it on an executable under UNIX and obtained a good reduction, for >>reasons which are not apparent. I'm sure that there are cases where this does This is actually partially true. The first "compress" to appear on the net (several years ago) only worked on text files and dumped core on binary files. The reason you get good compressions on binary files is probably that they haven't been stripped of the relocation info. Strip them first and I doubt that the compression will be so good (otherwise, throw your optimizer into the bit bucket). Typical (large) text compression is about 67%, whereas binaries are closer to 20%. (I use 16-bit compress). > >A Unix file is just a stream of bytes, and so is an MS-DOS file >except that it has extra attributes as well. Compress replaces byte >strings with codes whose lengths are between 9 and 16 bits. It will >work well on any file in which some byte sequences are more common >than others. An executable file consists of instructions, which, for >almost all processors are integral numbers of bytes, and some are >much more common than others. So compress works fine, and will give >good compression for just about any executable file. There are This is doubtful. There's a good description of the workings of LZW in the GIF docs (recently posted). Bytes aren't the key feature here, but rather sequences of repeated bytes which should be rare in an optimized executable (on Unix at least). >several types of graphics files: bitmaps are HIGHLY compressible; If they have lots of blank space, or other repeated sequences. Otherwise, they can be very similar to executables: 10-20%. >other types of files act like a program for an imaginary computer and >consist of byte codes, some much more common than others. These >compress well also. You must mean Huffman coding. These comments are true in that case, not LZW. > >There are only three types of files I've ever given to compress that >haven't been reduced in size as a result: random binary data, >floating point binary data, and files that have already been >compressed. > The point being that there is little redundancy. >--
wyle@solaris.UUCP (Mitchell Wyle) (05/07/88)
This discussion will bear fruit only if r$ or the backbone gurus implement one of these schemes as a usenet standard, and distribute sources or binaries packaged with tarmail or whichever scheme wins this debate. I vote for tarmail. Let's get a standard accepted! -- -Mitchell F. Wyle wyle@ethz.uucp Institut fuer Informatik wyle%ifi.ethz.ch@relay.cs.net ETH Zentrum 8092 Zuerich, Switzerland +41 1 256-5237
mike@ists (Mike Clarkson) (05/08/88)
In article <4521@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes: > I don't accept or transmit the binaries newsgroups (on hoptoad), so maybe > I don't get to vote; but I think that rather than figuring out fancy ways > to pass binaries around, we should just remove them from the Usenet. I know that many in the amiga, mac and PC worlds would scream, but it might force a really postive change in those worlds: good C compilers. Maybe it would help get good ANSI C compilers written and sold. > "Use the Source, Luke...." hysterical ! -- Mike Clarkson mike@ists.UUCP Institute for Space and Terrestrial Science mike@ists.yorku.ca York University, North York, Ontario, uunet!mnetor!yunexus!ists!mike CANADA M3J 1P3 +1 (416) 736-5611
ford@elgar.UUCP (Ford Prefect ) (05/08/88)
In article <9644@agate.BERKELEY.EDU> laba-5ac@web7f.berkeley.edu.UUCP (Erik Talvola) writes: >What's wrong with getting a 16-bit Compress executable file for the PC >which was compiled with a proper C compiler? Then, you can run a 16-bit >compress on any PC. You are right in that you may not be able to compile >it with all C compilers, but you can run the executable on any PC (as long >as you have ~500K free). There are a few problems with this approach: 1) Such a compiler has to exist for the operating system you are running. Obviously, the author had his brain in Ms.Dos mode, which, since the article was cross-posted to comp.binaries.ibm-pc, is forgivable in this case. But one of the articles that was being followed up to mentioned an O.S. that only supported 64k segments. Compress just won't work in such an environment without major redisign (like keeping the arrays in a disk file :-). 2) The executable you get must be for your CPU! This is obvious, of course, but I keep detecting a definite ibm-pc-chauvanist state of mind in this discussion. Don't forget that there are people who are still running unix on PDP-11's and proud of it! The PDP-11 is very similar to the 8086 except that nobody does anything as kludgey as geferkin with the segment registers! So the best you can get is 64k code, 64k data. In other words, discussion of a standardized compression format must take into account the existence of small machines. And "PC" != "Intel Cpu". Personally, I use 16-bit compress since I don't need to talk to such small machines. But if I need to post a binary to the net, I will probably use 12-bit compress, because I've never heard of a machine or compiler that couldn't run it. -=] Ford [=- "Once there were parking lots, (In Real Life: Mike Ditto) now it's a peaceful oasis. ford%kenobi@crash.CTS.COM This was a Pizza Hut, ...!sdcsvax!crash!kenobi!ford now it's all covered with daisies." -- Talking Heads
wnp@dcs.UUCP (Wolf N. Paul) (05/08/88)
In article <5098@chinet.UUCP> les@chinet.UUCP (Leslie Mikesell) writes: >In article <25816@clyde.ATT.COM> wtr@moss.UUCP (Bill Rankin) writes: >>Personally, I use cpio & compress to move files. I don't care >>about execution time, rather transmission time is my most important >I like this also, but if an entire cpio archive is compressed, it >is impossible to (a) list the directory without a decompression pass >or (b) recover any part beyond a bit error in transmission. Has >anyone condsidered a program which would leave the cpio headers >uncompressed but store the data as though each file had been individually >compressed (including adding the .Z to the name so extraction would be >possible with a normal cpio followed by uncompress)? This would be >a nice thing to use for normal backups, especially if it followed the >normal compress rules of not trying to compress something that already >had the .Z extension. That still leaves the problem of compress needing >2 extra characters in the filename and DOS needing some other name convention >entirely... Well, the sources for a cpio-compatible archiver are available from sites which archive comp.sources.unix. This archiver is called AFIO. Someone out there volunteering to add the code to do compression as suggested by Leslie? I don't think I'm qualified or I'd attempt it. -- Wolf N. Paul * 3387 Sam Rayburn Run * Carrollton TX 75007 * (214) 306-9101 UUCP: ihnp4!killer!dcs!wnp ESL: 62832882 INTERNET: wnp@DESEES.DAS.NET or wnp@dcs.UUCP TLX: 910-280-0585 EES PLANO UD
leonard@bucket.UUCP (Leonard Erickson) (05/08/88)
In article <1082@maynard.BSW.COM> campbell@maynard.UUCP (Larry Campbell) writes:
<Only a subset of PCs can do 16-bit compress/uncompress. Mine can't.
<I'm running VENIX/86 2.0, which is basically V7; the PCC-derived
<C compiler has only the tiny and small memory models (exactly
<corresponding to non-split and split PDP-11s, which also cannot
<handle 16-bit compress).
<
<So it is true that PCs with a C compiler that supports multiple data
<segments can handle 16-bit compress, but that hardly encompasses all
<PCs in the world.
Larry, you are confusing being able to *compile* a program and being able
to *use* it! I don't have *any* kind of C compiler. But I can uncompress
stuff that was compressed on a Unix system on my PC.
Some kind soul posted an msdos *binary* for compress a while back. All
you need is DOS and more than 512k of ram...
True, this places two limits on the people who are using the program:
1. they've got to be using MS-DOS. (since we are talking about comp.-
binaries.ibm.pc any arguments that this is a serious restriction
should be routed to /dev/null)
2. they have to have 640k (576 will probably work, but I haven't
tried it). This *is* a problem, but even at current memory prices
it isn't *too* serious. (Unless you have an AT whose memory is mapped
as 512 dos/512 extended)
--
Leonard Erickson ...!tektronix!reed!percival!bucket!leonard
CIS: [70465,203]
"I used to be a hacker. Now I'm a 'microcomputer specialist'.
You know... I'd rather be a hacker."
jw@pan.UUCP (Jamie Watson) (05/09/88)
In article <4521@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes: >Has the first virus been transmitted by Usenet yet? Just think, 8100 >readers of comp.binaries.ibm.pc will all be infected at once! And every one of them deserves PRECISELY what they get... >I don't accept or transmit the binaries newsgroups (on hoptoad), so maybe >I don't get to vote; Me either, but I don't appreciate having to clutter my sys file, and that of my news feed, just to explicitly exclude them. > but I think that rather than figuring out fancy ways >to pass binaries around, we should just remove them from the Usenet. This is an excellent idea. I second the motion. >People who want binaries can start their own alternative network (bin.xxx?) >and waste their own bandwidth and eyewidth. AND MONEY, in some cases. jw
wayneck@tekig5.TEK.COM (Wayne Knapp) (05/09/88)
In article <552@csccat.UUCP>, loci@csccat.UUCP (Chuck Brunow) writes: > > Let me point out one simple fact: source code is VERY MUCH > SMALLER than binaries. Have you ever done any real programing? Sure if your souce code is only 20 lines long the binary is smaller. However I'm used to seeing ratios like 500k of source code to 160k of binaries. Or something I have at home 7 880k disks full of souce code to 1 disk of binaries. From what I've seen, the longer the program the greater the ratio of source code to binaries. Wayne Knapp
dg@lakart.UUCP (David Goodenough) (05/09/88)
From article <4521@hoptoad.uucp>, by gnu@hoptoad.uucp (John Gilmore): > Has the first virus been transmitted by Usenet yet? Just think, 8100 > readers of comp.binaries.ibm.pc will all be infected at once! This is a very valid objection to the transmission of binaries on the net. I once figured out a virus to infect CP/M, and I know it can be ported to MS-DOS: the real beauty of it was that it would not only eat hard disks, but floppies as well (My plan was to install it in a software package that I was thinking of selling, to discourage illegal copies). As it resided in a little under 1/2 K of binary, it was very inoccuous, until it showed. But when it did .... ALL data, directory and system tracks on a disk just vanished, and the way I did it, not even the Norton utilities (or the CP/M equivalent) could bring back the files. Mercifully I have never put this out, but as John Gilmore says, the notion of such a beast running round on usenet gives me the screaming horrors. I hear objections to use of C for posting source, but I have never found it a problem: I regularly take small & medium sized C sources from UNIX, and port them to my CP/M machine at home. It's not that difficult to do: of course it's going to object if I try to port hack :-) :-), but ONLY for size reasons: I can get each of the separate source files to compile, I just can't link them. So lets see more source, and leave the binaries for those poor trusting souls that don't know about the real world. Call me a cynic, but after some of the warnings I've seen on a local BBS, sooner or later the axe is going to fall. -- dg@lakart.UUCP - David Goodenough +---+ | +-+-+ ....... !harvard!adelie!cfisun!lakart!dg +-+-+ | +---+
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/10/88)
Since the discussion is on I'm posting atob and btoa to binaries. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/10/88)
In article <552@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes: > > Let me point out one simple fact: source code is VERY MUCH > SMALLER than binaries. And another: not everybody has all compilers. There have been postings in MSC, Turbo C, Turbo Pascal, MASM, Fortran and COBOL (yecch) so far on this group. That's why we have a binary group. Besides I wouldn't give out source to some things which I can distribute as binary. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
jeff@cullsj.UUCP (Jeffrey C. Fried) (05/10/88)
I recently acquired the ZOO executables from the net and found them to be incompatible with ARC. The UNIX ARC i received over the net is compatible with ARC5.2.1 under DOS. Has anyone else experienced this incompatibility?
jtara@m2-net.UUCP (Jon Tara) (05/10/88)
In article <3980@killer.UUCP>, chasm@killer.UUCP (Charles Marslett) writes: > In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes: > > Just one thing that needs to be known -- PC's can do no more than 12-bit > > compression. ... > > Actually, I have sent several people copies of a minor mod to compress 4.0 Funny, I ran compress 4.0 through the Microsoft 4.0 compiler using large model, and I've been happily compressing and de-compressing with 16 bits ever since. Far as I can tell, it doesn't need any changes, at least under MS/PC-DOS and Microsoft C. It does need a good chunk of memory, which most people should have, unless you're a real TSR nut. -- jtara%m-net@umix.cc.umich.edu ihnp4!dwon!m-net!jtara "You don't have to take this crap. You don't have to sit back and relax." _Walls Come Tumbling Down_, The Style Council
rick@pcrat.UUCP (Rick Richardson) (05/10/88)
In article <679@omen.UUCP> caf@omen.UUCP (Chuck Forsberg WA7KGX) writes: >Again, please post the ARITH program. It would be most interesting >if the memory requirements are small - like Huffman encoding instead >of LZW. In case ARITH never gets posted: the complete article and program appeared in ACM last year, in C. I typed it in myself (and lost it later). The program, as published, runs a lot slower than compress and does not do quite as good a job as compress. It was better than "pack". It is very small, and uses little memory. If you dig into the article (this from memory, I seem to have misplaced the issue of ACM as well), the program separates the algorithm for encoding into a model. Two models are presented, one that just uses a static letter frequency table (for text), and an adaptive model (for binaries). As I recall, the author pointed out that more sophisticated adaptive algorithms could be used for better results. After monkeying around with the program for an evening, and even trying my own hand at a more sophisticated model, I shelved the program, with nary a backup. Since it was slower and less efficient than compress, I think its usefullness is limited to those applications which are sensitive to both program and data size, such as in a modem. BTW, I heard some rumor that a 16 bit "uncompress"-only is available for limited memory systems. If this is true, then why all the fuss about 16 bit compression? -- Rick Richardson, President, PC Research, Inc. (201) 542-3734 (voice, nights) OR (201) 834-1378 (voice, days) uunet!pcrat!rick (UUCP) rick%pcrat.uucp@uunet.uu.net (INTERNET)
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/10/88)
In article <307@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: | I recently acquired the ZOO executables from the net and found them to be | incompatible with ARC. Correct. zoo is not "another arc file program," it is a totally separate file structure, containing information which neither arc or pkarc include. | The UNIX ARC i received over the net is compatible | with ARC5.2.1 under DOS. Has anyone else experienced this incompatibility? Alas, there is no "the" UNIX arc, there are a number of slightly diferent versions. If you have the one I suspect, it needs the "-i" option to be compatible with the DOS arc. I highly commend switching the meaning of that flag for default DOS compatibility. Actually I highly commend using zoo... -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
tneff@dasys1.UUCP (Tom Neff) (05/10/88)
In article <307@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: > I recently acquired the ZOO executables from the net and found them to be >incompatible with ARC. The UNIX ARC i received over the net is compatible >with ARC5.2.1 under DOS. Has anyone else experienced this incompatibility? Yes, everyone has experienced this incompatibility Jeffrey, because they are not SUPPOSED to be compatible! :-) ARC is one archiving standard, ZOO is a completely different standard. You need one set of programs to create, list and extract ARC files, and a different set to manipulate ZOO archives. You can't use one with the other. Now, if your next question was going to be why there are two incompatible archiving standards for the MSDOS/UNIX/VMS environment, you'll have to ask our very own moderator Rahul, because there was only one (ARC) until he decided to invent his own. I told him at the time that user confusion would result, but the argument is moot at this point. -- Tom Neff UUCP: ...!cmcl2!phri!dasys1!tneff "None of your toys CIS: 76556,2536 MCI: TNEFF will function..." GEnie: TOMNEFF BIX: are you kidding?
campbell@maynard.BSW.COM (Larry Campbell) (05/11/88)
In article <2894@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: <>In article <4521@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes: <>>I think that rather than figuring out fancy ways <>>to pass binaries around, we should just remove them from the Usenet. Hear, hear! <>Look at the alternative: To be able to use sources on most <>microcomputers, you would probably have to have about five different C <>compilers, two or three assemblers, a Pascal compiler or two, and at <>least 10 megabytes of hard disk space for the big ones. Realize that <>no current microcomputer operating system on the market costing less <>than $300 comes bundled with a decent language translator. <>-- <>Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi Nope. All you need is Turbo-C (about $60 retail) and Turbo Pascal (a bit less, I think). Probably about $100 total. Almost no one writes or posts in assembler these days, but I think there are inexpensive assemblers floating around. I can't understand someone spending thousands of dollars on PC hardware, hundreds of dollars on modems and telephone charges, and then balking at shelling out 60 bucks for an _excellent_ C compiler! -- Larry Campbell The Boston Software Works, Inc. Internet: campbell@maynard.bsw.com 120 Fulton Street, Boston MA 02109 uucp: {husc6,mirror,think}!maynard!campbell +1 617 367 6846
pjh@mccc.UUCP (Pete Holsberg) (05/11/88)
In article <10770@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
... Alas, there is no "the" UNIX arc, there are a number of slightly
...diferent versions. If you have the one I suspect, it needs the "-i"
...option to be compatible with the DOS arc. I highly commend switching the
^^^^^^^^^^^^^
...meaning of that flag for default DOS compatibility.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
How do you do that? Thanks.
mitch@stride1.UUCP (Thomas P. Mitchell) (05/12/88)
In article <10758@steinmetz.ge.com> davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes: >In article <552@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes: >> > >And another: not everybody has all compilers. There have been postings >in MSC, Turbo C, Turbo Pascal, MASM, Fortran and COBOL (yecch) so far on >this group. That's why we have a binary group. Besides I wouldn't give >out source to some things which I can distribute as binary. To send binary or not to send binary, this is a tough question. But my general thoughts on this is that binarys are the way to distribute a product you sell and support. Source text is the way to make available something you wish to share and are willing to see improved and expanded. Oh yes critized as well. The argument that not evrybody has all compilers is real. Yet I dislike it. To me compilers are like a key board, a computer is worth little without one. Back to the topic. Not all binary files are code so how do we transfer binarys or anything else for that matter. Some things like bit maps (face server, fonts) and other data needs to be transfered from machine to machine. And at times code binarys as well (blush I did say this). My thought on this is that the link should know how to best send the data. In other words uucp should be expanded to exchange abilities. Consider an initial uucp conection in which the programs exchange information like "have compress16|compress12, have bta/atb, have kermit, have xmodem , have link_is_100%, have TeleBit, have never_talked, have exchanged_compression_tables". Given this type of information the program then can select the best tool to get the best effective transfer rate for the next conversation. Well what say you all? Thanks for the soap. mitch@stride1.Stride.COM Thomas P. Mitchell (mitch@stride1.Stride.COM) Phone: (702)322-6868 TWX: 910-395-6073 FAX: (702)322-7975 MicroSage Computer Systems Inc. Opinions expressed are probably mine.
dhesi@bsu-cs.UUCP (Rahul Dhesi) (05/12/88)
In article <1083@maynard.BSW.COM> campbell@maynard.UUCP (Larry Campbell) writes: [justifying the claim that source postings are an adequate substitute for binaries] >Nope. All you need is Turbo-C (about $60 retail) and Turbo Pascal (a bit >less, I think). ... >I can't understand someone spending thousands of dollars on PC hardware, >hundreds of dollars on modems and telephone charges, and then balking at >shelling out 60 bucks for an _excellent_ C compiler! This misses the point. If somebody posts source that is compilable only by the Datalight C compiler, or by MIX C, or by Microsoft Pascal, or by Utah Pascal, or by CHASM, or by the Microsoft Macro assembler, or by any of dozens of other language translators, having Turbo C and Turbo Pascal would likely mean an investment of a few days or weeks (or months) making that source work. As I said before, microcomputer operating systems costing less than $300 do not come bundled with any decent language translators. Users buy their own, and they are seldom compatible with each other. ANSI C and cheap, conforming C compilers may change this to an extent. But there will always be many things that will not be efficiently doable in portable C. High-performance graphics are one glaring example. Finally consider that not all users are, or want to be, programmers. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
Dion_L_Johnson@cup.portal.com (05/12/88)
I can well understand that there are good reasons for us (most of us) to want to have source code, but there are also sometimes good reasons to distribute binaries. Hasnt this topic been talked to death in other fora, at other times? Perhaps someone will post a summary about the good/bad points of each distribution scheme? Also, why is it that those who decry binaries do so with such vehemence? Finally, will the (someday) coming of widespread binary compatibility among at least some classes of systems affect the acceptance of binary distribution? comments? And should this be discussed somewhere else? thanks, - Dion
danno@microsoft.UUCP (Dan Norton) (05/13/88)
In article <145@elgar.UUCP>, ford@elgar.UUCP (Ford Prefect ) writes: > In article <9644@agate.BERKELEY.EDU> laba-5ac@web7f.berkeley.edu.UUCP (Erik Talvola) writes: > >What's wrong with getting a 16-bit Compress executable file for the PC... > > There are a few problems with this approach: > > 1) Such a compiler has to exist for the operating system you are > running... > ... But one of > the articles that was being followed up to mentioned an O.S. > that only supported 64k segments. Compress just won't work > in such an environment without major redisign (like keeping > the arrays in a disk file :-). You are wrong. In fact, such a compress exists, using memory only. Several people, including myself, have been able to modify the standard compress with little trouble, and it works just fine on IBM PC's. . . . . . . .
linhart@topaz.rutgers.edu (Mike Threepoint) (05/13/88)
tneff@dasys1.UUCP (Tom Neff) writes: -=> The source for ARC is available too, and it's running on (for instance) -=> this Stride. >sigh< But the only squash source I can find is in Pascal. Speaking of which... -=> Don't confuse the ARC standard with Phil Katz's PC-optimized clone PKARC. -=> Due to an assiduous sales job most PC sysops have the Katz thing, but it -=> ain't the original. The "C" language real McCoy is slower on PC's but -=> more portable. "Accept no imitations" should be reserved to sales jobs. PKARC is faster and compresses smaller, why wouldn't they use the Katz thing? ARCE, NARC, and NSWEEP also support squashing, so it's not even forcing PK(X)ARC on the users. My bottom line is the archive size, speed is gravy unless it operates as slow as... oh, I dunno... ARC? :-) On my BBS, my own experience is that PKARC creates smaller archives than ZOO, so I use PKARC when I don't need to store a directory subtree. Squashing has saved over a meg of space on my board. Sometimes PKARC is stupid about compression and squashes when it should crunch or crunches at a 0% compression rate instead of storing, but most of the time it's smaller. If ZOO crunched as well (>sigh<), I would use that. [Selfish mode: Maybe Rahul could find out what hashing algorithm PK or DWC is using to get better compression rates. Would simply things for me considerably.] -- "...billions and billions..." | Mike Threepoint (D-ro 3) -- not Carl Sagan | linhart@topaz.rutgers.edu "...hundreds if not thousands..." | FidoNet 1:107/513 -- Pnews | AT&T +1 (201)878-0937
gnu@hoptoad.uucp (John Gilmore) (05/13/88)
dhesi@bsu-cs.UUCP (Rahul Dhesi) wrote: > If somebody posts source that is compilable > only by the Datalight C compiler, or by MIX C, or by Microsoft Pascal... > ...having Turbo C and > Turbo Pascal would likely mean an investment of a few days or weeks (or > months) making that source work. This is curious. If someone posts a binary, getting it to work on another compilation system would likely mean an investment of weeks or months (or years) -- it's called "rewriting from scratch". This does not seem to bother Rahul; it only bothers him that IBM PC users might have to think rather than just uudecoding and executing. If somebody posts sources that are only compileable on one system (or, bog help us, on one C compiler on one system) then they do not know how to write a portable program. Should we not take their sources, port them to our systems if we want to run them, and send back or post the fixes? This is how I learned to write portable C code -- from seeing how portability problems had come about (in my code and in others' code, on the net) and noticing how talented people had fixed them. I asked a bunch of friends who work on IBM PC's why there are so many programs in the micro world that only compile on certain C compilers. This problem has been faced and solved 10 years ago in the mini world, and the techniques are well known (#ifdef's, declaring "short" or "long" rather than "int", relying on standard library routines rather than system calls, passing the program to a few friends who have different compilers/systems for testing before you post it, etc). Nobody had a decent answer. I'm forced to assume that most of these authors do not know how to write or manage software. The solution to this problem is not to distribute their programs in binary. The solution is to distribute in source, fix it, and thereby teach people a little bit more than they knew about building reliable software. > Users > buy their own [compilers], and they are seldom compatible with each other. Sun's C compiler is not completely compatible with Amdahl's, DEC's, and GNU's. But we know ways to write code that runs under all of them. While the compiler writers could help a bit more, the real problem is the users. > But there will always be many things that will not be efficiently > doable in portable C. High-performance graphics are one glaring > example. Funny, the entire Sun graphics library is written in portable C. All the parts I've seen are in C, and it runs on 680x0, Sparc, and 386. A few months ago we ported it (the parts used in NeWS) to the Mac-II with few problems. Again, while there is always a bit to be squeezed out with assembler, mostly the problem is people who don't know how to write fast code. Let's see their sources, speed them up, and send 'em back. Also give them a copy of Jon Bentley's "Writing Efficient Programs" book. > Finally consider that not all users are, or want to be, programmers. But they come running to the programmers when their binary fails. They had better have sources around if they expect their guru to be able to help them! -- John Gilmore {sun,pacbell,uunet,pyramid,ihnp4}!hoptoad!gnu gnu@toad.com "Use the Source, Luke...."
loverso@encore.UUCP (John Robert LoVerso) (05/13/88)
In article <2932@cognos.UUCP> brianc@cognos.UUCP (Brian Campbell) writes: > In article <696@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes: > > If you are (sigh) going to post binaries on Usenet, DO NOT compress > > them first. Many Usenet sites use compress to pack up their news > > batches. Compressing a compressed file makes it larger. > > Maybe those Usenet sites should not use the -f (force) flag with compress. Thats not how (typical) news batching works. Compress is used as a stage of a pipe: batch | compress | uux and because compress doesn't know the size of its input when it starts up, it will *always* produce compressed output. The point is that an article which contains binary thats compressed and then uuencode/btoa/your_favorite'd will lower the compression ration for the batch that contains it. The overall size of the batch will be smaller if the included binary was just uuencoded, etc. I no longer carry comp.binaries.* as I am using its disk space to store more *useful* news. It would be nice to see such things split out into bin.*. As gnu says: "Use the Source, Luke..." John Robert LoVerso, Encore Computer Corp encore!loverso, loverso@multimax.arpa
phil@amdcad.AMD.COM (Phil Ngai) (05/14/88)
In article <786@stride.Stride.COM> mitch@stride1.UUCP (Thomas P. Mitchell) writes: >The argument that not evrybody has all compilers is real. Yet I >dislike it. To me compilers are like a key board, a computer is >worth little without one. Well, I strongly disagree. I don't have source for most of the things I run on this PC, nor do I want it. I don't have time to tinker with source code, compiling it, fixing it. I want to get the program and start using it. Of course, I use programs like PC-NFS, SCHEMA, ORCAD, PSPICE, and other CAD type tools. You probably don't know what this stuff is, so you can't appreciate that some people want to do useful work *with* their computers instead working *on* the computer. -- Make Japan the 51st state! I speak for myself, not the company. Phil Ngai, {ucbvax,decwrl,allegra}!amdcad!phil or phil@amd.com
allbery@ncoast.UUCP (Brandon S. Allbery) (05/15/88)
As quoted from <8430@iuvax.cs.indiana.edu> by bobmon@iuvax.cs.indiana.edu (RAMontante): +--------------- | The biggest problem I see is that many news mailers compress everything | blindly, so that an already-compressed file gets bigger. This would also be | true of a sufficiently random file, although I think most executables aren't | that random. And this compress-and-be-damned behavior is not a strength of | the system, it's a weakness. (Even compress will complain if its result is | bigger than its original; does the mailer ignore this, or are the net.gods | lying when they claim they're shipping bigger files because of the double | compression?) +--------------- When compress is invoked as compress (file) it complains. When it's invoked as: sendbatch | compress | uux -r - oopsvax!rnews it can't do so without compressing to a temp file while saving its input in a second temp file, then comparing sizes and copying the smaller of the two: wasteful of space and time. (You can't, of course, seek backwards on a pipe.) -- Brandon S. Allbery, moderator of comp.sources.misc {well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery Delphi: ALLBERY MCI Mail: BALLBERY
allbery@ncoast.UUCP (Brandon S. Allbery) (05/16/88)
As quoted from <563@csccat.UUCP> by loci@csccat.UUCP (Chuck Brunow): +--------------- | In article <2096@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes: | >In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: | >> I stand corrected. Since Lem-Ziv was DESIGNED for text compression, and | >>the authors do not mention its use for binaries, i never considered using it. | >>I tried it on an executable under UNIX and obtained a good reduction, for | >>reasons which are not apparent. I'm sure that there are cases where this does | | This is actually partially true. The first "compress" to appear | on the net (several years ago) only worked on text files and | dumped core on binary files. The reason you get good compressions | on binary files is probably that they haven't been stripped of | the relocation info. Strip them first and I doubt that the | compression will be so good (otherwise, throw your optimizer | into the bit bucket). Typical (large) text compression is about | 67%, whereas binaries are closer to 20%. (I use 16-bit compress). +--------------- Wrong. Consider that, for example, every call to putchar() contains some fixed code (such as a call to _flsbuf()); this, on a 32-bit address space machine, will always be the same byte sequence (on a 680x0, it's 6 bytes). Other things will also be common: printf("format", non-double-value); (which is by far the *most* common use of printf(), from what I've seen; perhaps others have seen other more common calls) has the constant assembler code on a 680x0: jsr _printf 6 bytes addql #8,a6 2 bytes (and "printf("constant")", also common, is a slightly different 8-byte value). These kind of extremely common operations can't be optimized out and are quite amenable to compression. RISC eecutables are likely to be even more amenable to compression, since many operations will assemble into lengthy byte sequences -- many of which will be partially or totally identical. Ergo: compression of executables generally works pretty well. (I regularly see 50%-60% on stripped, optimized executables on ncoast.) -- Brandon S. Allbery, moderator of comp.sources.misc {well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery Delphi: ALLBERY MCI Mail: BALLBERY
jpn@teddy.UUCP (John P. Nelson) (05/16/88)
>The point is that an article which contains binary thats compressed and then >uuencode/btoa/your_favorite'd will lower the compression ration for the batch >that contains it. The overall size of the batch will be smaller if the >included binary was just uuencoded, etc. If this was TRUE, it would be a good argument. It is NOT true. Most binary files that are compressed, uuencoded, then compressed again are SMALLER than binary files that are simply uuencoded, then compressed. I have yet to see anyone post results that refute this. A few people have pointed out counter-examples: These usually involve compressing an ARC file (or other binary file with very little compressability in the first place). The few cases I have seen where using ARC (which will NOT try to compress a file that is uncompressable), followed by uuencode, followed by compress generates a larger file than uuencode/compress alone, the file lengths were within 1% of each other. If someone has seen different results, I would be interested in seeing them. I already KNOW that compressing ASCII files (source or text) then uuencoding is a bad idea: I am interested in results from BINARY FILES only! I think we should SETTLE this issue once and for all! -- john nelson UUCP: {decvax,mit-eddie}!genrad!teddy!jpn ARPA (sort of): talcott.harvard.edu!panda!teddy!jpn
dhesi@bsu-cs.UUCP (Rahul Dhesi) (05/17/88)
To avoid ambiguity, I suggest the following terminology. B = binary T = text U = uuencoding C16 = 16-bit LZW ("compress" default) C12 = 12-bit LZW (arc) C13 = 13-bit LZW (zoo, squashing) So, instead of claiming that "uuencoded binary files compressed are larger than not uuencoding" it is better to say that "BC12UC16 is worse than BC16", or "BUC16U is worse than BC16" etc. BC12UC16 means: (B) take a binary file (C12) compress using arc or 12-bit "compress" (U) uuencode it (C16) compress using 16-bit "compress" Also, since binary files differ, it's good to use some standard binary file in benchmarks, e.g. your UNIX kernel stripped of symbols, so there is some degree of consistency. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
dg@lakart.UUCP (David Goodenough) (05/18/88)
From article <4776@teddy.UUCP>, by jpn@teddy.UUCP (John P. Nelson): >>The point is that an article which contains binary thats compressed and then >>uuencode/btoa/your_favorite'd will lower the compression ration for the batch >>that contains it. The overall size of the batch will be smaller if the >>included binary was just uuencoded, etc. > > If this was TRUE, it would be a good argument. It is NOT true. > > Most binary files that are compressed, uuencoded, then compressed again > are SMALLER than binary files that are simply uuencoded, then > compressed. I have yet to see anyone post results that refute this. >>>>> CORRECT <<<<< Note this: Script started on Tue May 17 13:01:21 1988 lakart!dg(bin/junk)[61]-> ds -rwxr-x--- 1 dg 40960 May 17 12:55 arc* -rwxr-x--- 1 dg 32768 May 17 12:55 atob* -rwxr-x--- 1 dg 16384 May 17 12:55 btoa* -rwxr-x--- 1 dg 36864 May 17 12:55 cg* -rwxr-x--- 1 dg 24576 May 17 12:55 clock* -rwxr-x--- 1 dg 16384 May 17 12:55 ddr* lakart!dg(bin/junk)[62]-> file * arc: demand paged pure executable atob: demand paged pure executable btoa: demand paged pure executable cg: demand paged pure executable clock: demand paged pure executable ddr: demand paged pure executable lakart!dg(bin/junk)[63]-> foreach i (*) ? compress < $i | uuencode $i.Z | compress > $i.u.Z ? uuencode $i < $i | compress > $i.u.z ? end lakart!dg(bin/junk)[64]-> ds -rwxr-x--- 1 dg 40960 May 17 12:55 arc* -rw-r----- 1 dg 29285 May 17 13:03 arc.u.Z -rw-r----- 1 dg 32027 May 17 13:03 arc.u.z -rwxr-x--- 1 dg 32768 May 17 12:55 atob* -rw-r----- 1 dg 18418 May 17 13:04 atob.u.Z -rw-r----- 1 dg 19330 May 17 13:04 atob.u.z -rwxr-x--- 1 dg 16384 May 17 12:55 btoa* -rw-r----- 1 dg 7896 May 17 13:04 btoa.u.Z -rw-r----- 1 dg 8384 May 17 13:04 btoa.u.z -rwxr-x--- 1 dg 36864 May 17 12:55 cg* -rw-r----- 1 dg 28412 May 17 13:04 cg.u.Z -rw-r----- 1 dg 30864 May 17 13:04 cg.u.z -rwxr-x--- 1 dg 24576 May 17 12:55 clock* -rw-r----- 1 dg 14299 May 17 13:04 clock.u.Z -rw-r----- 1 dg 15116 May 17 13:04 clock.u.z -rwxr-x--- 1 dg 16384 May 17 12:55 ddr* -rw-r----- 1 dg 7123 May 17 13:04 ddr.u.Z -rw-r----- 1 dg 7682 May 17 13:04 ddr.u.z lakart!dg(bin/junk)[65]-> foreach i (*[a-y]) ? echo $i ? echo -n `wc -c <$i.u.Z` '* 100 /' `wc -c <$i` '== ' ? z `wc -c $ <$i.u.Z` '* 100 /' `wc -c <$i` ? echo -n `wc -c <$i.u.z` '* 100 /' `wc -c <$i` '== ' ? z `wc -c <$i.u.z` '* 100 /' `wc -c <$i` ? end arc 29285 * 100 / 40960 == 71 == 0x47 32027 * 100 / 40960 == 78 == 0x4e atob 18418 * 100 / 32768 == 56 == 0x38 19330 * 100 / 32768 == 58 == 0x3a btoa 7896 * 100 / 16384 == 48 == 0x30 8384 * 100 / 16384 == 51 == 0x33 cg 28412 * 100 / 36864 == 77 == 0x4d 30864 * 100 / 36864 == 83 == 0x53 clock 14299 * 100 / 24576 == 58 == 0x3a 15116 * 100 / 24576 == 61 == 0x3d ddr 7123 * 100 / 16384 == 43 == 0x2b 7682 * 100 / 16384 == 46 == 0x2e lakart!dg(bin/junk)[66]-> ^D script done on Tue May 17 13:12:09 1988 In all cases (I actually looked at over 20 stripped executables) compress | uuencode | compress is fractionally smaller than: uuencode | compress, Both of which are heaps smaller than the raw file. Since the difference between compress | uuencode | compress and just plain uuencode | compress is so small (between 2 and 10 %) I can't see the point in continuing this discussion. >>>>> THEREFORE 'COMPRESS, UUENCODE AND POST' GETS THE BEST RESULTS. <<<<< [1] Now can we let this subject rest in peace. [1] Not counting ARC & ZOO. I Don't know where they stand, so I am saying nothing about them. -- dg@lakart.UUCP - David Goodenough +---+ | +-+-+ ....... !harvard!adelie!cfisun!lakart!dg +-+-+ | +---+
allbery@ncoast.UUCP (Rich Garrett) (05/24/88)
As quoted from <4776@teddy.UUCP> by jpn@teddy.UUCP (John P. Nelson): +--------------- | >The point is that an article which contains binary thats compressed and then | >uuencode/btoa/your_favorite'd will lower the compression ration for the batch | >that contains it. The overall size of the batch will be smaller if the | >included binary was just uuencoded, etc. | | If this was TRUE, it would be a good argument. It is NOT true. | | Most binary files that are compressed, uuencoded, then compressed again | are SMALLER than binary files that are simply uuencoded, then | compressed. I have yet to see anyone post results that refute this. +--------------- Single files, yes. But the quoted message above specifically says BATCHES. Batches include messages of all kinds from multiple newsgroups; to verify whether batch compression is reduced, we have to modify sendbatch to print the compression ratio and then run sendbatch with both compressed and uncompressed uuencodes to see which results in smaller batches. (We also need a non-destructive "test" mode for sendbatch to (a) insure that the batches are otherwise identical and (b) not screw up news transmission.) This would have to be done with a number of batches and the results averaged in order to give us a reasonably accurate result. -- Brandon S. Allbery, moderator of comp.sources.misc {well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery Delphi: ALLBERY MCI Mail: BALLBERY
jpn@teddy.UUCP (John P. Nelson) (05/24/88)
>| Most binary files that are compressed, uuencoded, then compressed again >| are SMALLER than binary files that are simply uuencoded, then >| compressed. I have yet to see anyone post results that refute this. > >Single files, yes. But the quoted message above specifically says BATCHES. >... >to verify whether batch compression is reduced, we have to modify sendbatch... Well, anyone is welcome to MAKE this expermiment, but it is totally unnecessary in my opinion. Of COURSE pre-compressing is going to reduce the compression ratio of a batch. This is irrelevent, because less data needs to be batched. Remember: "compress" uses an ADAPTIVE Lempel-Ziv method: If the old string table isn't working, Compress will RESET the table and start over, right in the middle of the file being compressed. Neither a uuencoded file nor a compressed uuencoded file will look much like an ordinary ascii file: the string table used in either "uuencode" case will not look much like the string table generated for normal text batches. Either type of "uuencode" will cause "compress" to reset the string table. Besides, I think we have gotten off the track here. Even if pre-compressing DOES adversely affect the overall size of data transmitted slightly: I think we have shown that it doesn't increase the size of the data SIGNIFICANTLY: in the simplest case, it REDUCES the size. The most common use of pre-compressing is the use of "ARC" (or "zoo") to bundle multiple files together: There is no other way convenient way to "bundle" binary files at the moment. I think we have shown that the use of ARC to build bundles of binary files is NOT detrimental! I think we should now focus our collective energies on a more productive topic. The real unresolved issue, of course, is whether to allow binaries on USENET at all. -- john nelson UUCP: {decvax,mit-eddie}!genrad!teddy!jpn smail: jpn@genrad.com