[comp.sources.d] Standard for file transmission

jeff@cullsj.UUCP (Jeffrey C. Fried) (05/01/88)

   Currently there is an ongoing discussion in comp.binaries.ibm.pc.d 
concerned with establishing a standard for the exchange of software over 
the net.  I would like to offer a suggestion.  The following tools are 
available in source code format:  COMPRESS (Lem-Ziv text compressor), ARITH
(arithmetic compression for binary), UUencode/decode.  Since all of these
will run under a variety of environments (IBM-PC, AMIGA, ATARI, VMS, SYS5,
BSD), why not make these the basis for communicating.  I may have a public
domain SHAR/UNSHAR program in C as well (i do have a text archiver in PASCAL
as well that would suffice if PASCAL were acceptable to everyone).
   These should be enough to support everyone's needs, and because we have
the source, it can be made available to everyone.  I realize that some of
you may have a favorite tool which you may feel surpasses the capabilities
of these.  I am making these suggestions to provide a starting point towards
arriving at a mutually acceptable standard.
   For those who do not know, COMPRESS is a single file text compressor
which works faster than any of the ARC clones, and, ARITH is something i
constructed from a description in the ACM.  ARITH compresses slower, but
better, than huffman, ie., SQ/UNSQ.  Most of all its in the public domain
and i'll be posting source if enough people show an interest.
   In any case, let the discussion continue.

w8sdz@brl-smoke.ARPA (Keith B. Petersen ) (05/01/88)

Rather than discussing how to compress our files we should be discussing
how to get them transferred error-free through the network.
Uuencode/uudecode and compress/uncompress do no error checking.

Think about this the next time you are tempted to uuencode a binary
file.  How do you know it will be received error-free by the recipient?
At least when it is compressed by the ARC program a CRC of the original
file is stored *inside* the ARC.  It is checked when you extract the
member file.

The net spends thousands of dollars on reposts of truncated or otherwise
munged files.  Some of that money would be better spent on finding where
the problem is and fixing it.  A uuencode with CRC or checksum would go
a long way towards finding the site(s) responsible for this waste.
-- 
Keith Petersen
Arpa: W8SDZ@SIMTEL20.ARPA
Uucp: {bellcore,decwrl,harvard,lll-crg,ucbvax,uw-beaver}!simtel20.arpa!w8sdz
GEnie: W8SDZ

wcf@psuhcx.psu.edu (Bill Fenner) (05/01/88)

Just one thing that needs to be known -- PC's can do no more than 12-bit
compression.  So if you are compressing your file from a UNIX system,
you need to say comress -b12 filename .

  Bill
-- 
   __      _  _      _____   Bill Fenner     Bitnet: wcf @ psuhcx.bitnet
  /  )    // //       /  '                   Internet: wcf @ hcx.psu.edu
 /--<  o // //     ,-/-, _  __  __  _  __    UUCP: ihnp4!psuvax1!psuhcx!wcf
/___/_<_</_</_    (_/   </_/ <_/ <_</_/ (_   Fido: Sysop at 263/42

jpn@teddy.UUCP (John P. Nelson) (05/02/88)

>Just one thing that needs to be known -- PC's can do no more than 12-bit
>compression.  So if you are compressing your file from a UNIX system,
>you need to say comress -b12 filename .

This myth has been repeated several times, so I felt it was necessary to
speak up.  PCs most certainly CAN do a 16 bit compress/uncompress.  It
takes 512K of available memory to run, and you also either need a compiler
that supports HUGE model arrays, or else you have to manually break up the
buffer space into multiple 64K arrays (this is what the version I have does -
The port was done a couple of years ago for XENIX, but it works just fine
under MSDOS as well).
-- 
     john nelson

UUCP:            {decvax,mit-eddie}!genrad!teddy!jpn
ARPA (sort of):  talcott.harvard.edu!panda!teddy!jpn

rsalz@bbn.com (Rich Salz) (05/02/88)

If you are (sigh) going to post binaries on Usenet, DO NOT compress
them first.  Many Usenet sites use compress to pack up their news
batches.  Compressing a compressed file makes it larger.
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.

wcf@psuhcx.psu.edu (Bill Fenner) (05/02/88)

In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
>>Just one thing that needs to be known -- PC's can do no more than 12-bit
>>compression.  So if you are compressing your file from a UNIX system,
>This myth has been repeated several times, so I felt it was necessary to
>speak up.  PCs most certainly CAN do a 16 bit compress/uncompress.  It
>takes 512K of available memory to run, and you also either need a compiler

Hard as it is to believe, a lot of people don't have 640k computers...
But, I think that this utility would do well to be distributed... mind
posting it on comp.binaries.ibm.pc?  (Can it do 12-bit also?)

         Thanks
-- 
   __      _  _      _____   Bill Fenner     Bitnet: wcf @ psuhcx.bitnet
  /  )    // //       /  '                   Internet: wcf @ hcx.psu.edu
 /--<  o // //     ,-/-, _  __  __  _  __    UUCP: ihnp4!psuvax1!psuhcx!wcf
/___/_<_</_</_    (_/   </_/ <_/ <_</_/ (_   Fido: Sysop at 263/42

norman@oravax.UUCP (Norman Ramsey) (05/03/88)

Someone mentioned that uuencode/decode do no error checking.  There is
a program called btoa/atob that converts binary to ascii and back
again.  It is more efficient than uuencode/decode (4 bytes binary go
to 5 bytes ascii) and has a checksum built in.  The programs are quite
short and I have ported them to the IBM PC no problem.  At my site at
least they came with compress 4.0, which itself came with TeX, so I
assume they are public domain.

Most frequently I use them with a script called `tarmail' which is
essentially  tar | compress | btoa | split -700 | mail, where we
actually mail things across the net in 700-line chunks.  There is an
`untarmail' at the other end which strips off the headers and (if
there are no errors) does the uncompress, the tar x, et cetera.  I'm
sure a `tarpost' could be put together with little or no difficulty.

Norman Ramsey
norman%oravax.uucp@cu-arpa.cs.cornell.edu

jeff@cullsj.UUCP (Jeffrey C. Fried) (05/03/88)

In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes:
> Just one thing that needs to be known -- PC's can do no more than 12-bit
> compression.  So if you are compressing your file from a UNIX system,
> you need to say comress -b12 filename .
   I've constructed a version of COMPRESS using 13 bits and the small
model by making only one array large.  I've also constructed a version in
BIG mode which runs at half the speed and compress only 10 better using the
full addressing used under UNIX.

wcf@psuhcx.psu.edu (Bill Fenner) (05/03/88)

In article <696@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
>If you are (sigh) going to post binaries on Usenet, DO NOT compress
>them first.  Many Usenet sites use compress to pack up their news
>batches.  Compressing a compressed file makes it larger.

We've gone through this before, and it has never been explained to my
satisfaction.  I think you do save something by compressing a uuencoded
compressed file over compressing the uuencoded uncompressed file.

I just did a test.  The file I used may not have been a good 'average
binary' (I used a moria save character - the best I could find on short
notice).  Anyway...

Origional size: (cannot send; it's binary): 95,348 bytes
Compressed (also cannot send; also binary):  6,772 bytes

Now... UUEncoded then compressed (the amount that would be transmitted
if you simply uuencode the file) :  11,531 bytes

And the kicker... compressed, UUEncoded, then compressed (as if you
compressed it, then uuencoded it, then posted it, then the news will
compress it) : 9009 bytes.

Like I said, this may not have been a proper 'average binary'.
I am going to write a shell script to check all these things, and
run it on several actual PC binaries and ARC files. I will post the
results to comp.binaries.ibm.pc.d.
-- 
   __      _  _      _____   Bill Fenner     Bitnet: wcf @ psuhcx.bitnet
  /  )    // //       /  '                   Internet: wcf @ hcx.psu.edu
 /--<  o // //     ,-/-, _  __  __  _  __    UUCP: ihnp4!psuvax1!psuhcx!wcf
/___/_<_</_</_    (_/   </_/ <_/ <_</_/ (_   Fido: Sysop at 263/42

loci@csccat.UUCP (Chuck Brunow) (05/03/88)

In article <55@psuhcx.psu.edu> wcf@psuhcx (Bill Fenner) writes:
>Just one thing that needs to be known -- PC's can do no more than 12-bit
>compression.  So if you are compressing your file from a UNIX system,
>you need to say comress -b12 filename .
>

	Are you quite sure about that? 13-bit compress will run on other
	64k segment machines (80?86 based).

jpn@teddy.UUCP (John P. Nelson) (05/03/88)

>If you are (sigh) going to post binaries on Usenet, DO NOT compress
>them first.  Many Usenet sites use compress to pack up their news
>batches.  Compressing a compressed file makes it larger.

This is incorrect.

I hope I can clear this up once and for all:

If you have ascii files (like source or documentation), then it is true
that compressing, then uuencoding is a BAD IDEA, even though the posting
appears to be smaller than the cleartext.  That is because when the file is
compressed again, it will be larger than the cleartext after IT is
compressed.

If you have a binary file that MUST be uuencoded to be posted, then
compression before uuencoding IS HELPFUL!  Most files that are
compressed, then uuencoded, then compressed again are signficantly
smaller than files that are simply uuencoded, then compressed once!  I
think that the reason this is true is that uuencoding tends to
interfere with the compression process.  By the way, compressing a
uuencoded file almost always results in a small reduction in size.

When I say "compressed", I include archival programs such as ARC and ZOO.

These conclusions were reached by experimental evidence (I didn't conduct
the experiments, others did, and they posted their results).  Perhaps
no one bothered to read these informative articles, (or else my suspicion
is true:  the maximum long-term memory of the average USENET reader is
no more than 1 month long).

-- 
     john nelson

UUCP:            {decvax,mit-eddie}!genrad!teddy!jpn
ARPA (sort of):  talcott.harvard.edu!panda!teddy!jpn

wtr@moss.ATT.COM (05/04/88)

In article <55@psuhcx.psu.edu> wcf@psuhcx (Bill Fenner) writes:
>Just one thing that needs to be known -- PC's can do no more than
>12-bit compression.  So if you are compressing your file from a
>UNIX system, you need to say comress -b12 filename. 

Sorry, but....

The posted patches for compress(4.0) to get it running on a 
Microport SV/AT (286) system have the patches to allow full 16-bit
compression on a 16-bit PC (an AT clone, in this case) It should not
prove too difficult to apply these patches to any other 16-bit
System (read: MS-DOS).

{ No flames intended, it took me ~1 month to figure out the 12/16
bit problem and locate the patches } 

Personally, I use cpio & compress to move files.  I don't care 
about execution time, rather transmission time is my most important
consideration, and so I desire the highest compression ratio I can
find.  I agree that for "real time" communication, compress is
totally inaddiquate because of it's processing needs.

=====================================================================
Bill Rankin
Bell Labs, Whippany NJ
(201) 386-4154 (cornet 232)

email address:		...![ ihnp4 ulysses cbosgd allegra ]!moss!wtr
			...![ ihnp4 cbosgd akgua watmath  ]!clyde!wtr
=====================================================================

egisin@watmath.waterloo.edu (Eric Gisin) (05/04/88)

In article <696@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes:
> If you are (sigh) going to post binaries on Usenet, DO NOT compress
> them first.  Many Usenet sites use compress to pack up their news
> batches.  Compressing a compressed file makes it larger.

But you are would not be compressing the compressed file,
you would be compressing an encoded file.

Here are the results of some experiments on a 100K UNIX binary:
$ uuencode | compress
-rw-r--r--  1 egisin           83111 May  3 16:25 uu.Z
$ compress | uuencode | compress
-rw-r--r--  1 egisin           81241 May  3 16:30 uuz.Z
Compressing before encoding results in a 2% shorter file,
but that is not really significant.

You can get better results by using a simple hex encoding:
$ compress | hexencode | compress
-rw-r--r--  1 egisin           78831 May  3 16:31 hdz.Z

None of this applies to source files,
they should never be compressed and encoded.

jeff@cullsj.UUCP (Jeffrey C. Fried) (05/04/88)

  1) COMPRESS is a text only compression routine.  It will not now, or ever,
     help in the compression of binary files.

  2) ARITH is a more general compression routine using adaptive arithmetic 
     coding.  It will compress binary files where there is redundancy, but
     when it fails (on an extremely random file) the result increases very
     little (under 1% in my experience).  It compresses better than HUFFMAN,
     but it is NOT faster than SQ/UNSQ which are written in assembler whereas
     ARITH is written in C.
     (Once again, i will post it if there is sufficient interest.)

  3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
     we are at the whims of whomever is currently supporting (or not supporting)
     them.

  4) COMPRESS works faster and better on text files then the ARC routines
     because they use 12 bit compression, where 13-bit (and more) are possible
     under even the PC for COMPRESS (i've tried it on ans AT-clone).

  5) On the weak side, there is as yet, no CRC or checksum for any of these,
     but adding it would be someithing i am willing to take responsibility
     for should enough people decide they would like to take the approach
     which i'm currently suggesting.
     Also, there no directory support provided with these tools.  They work
     on only one file at a time.  This is also correctable since the source
     is available.

  5) LASTLY: I am not trying to criticize the ARC routines, rather i am trying
     to offer an alternative which i feel will reduce the time for transmission
     of files, as well as, providing us with portability.  COMPRESS, ARITH, 
     UNSHAR and UUENCODE are all available at the source level.  COMPRESS and
     ARITH have been tried in at least three different environments: UNIX (BSD),
     VMS and PC/MS-DOS.
     Remember, for those of us who are NOT using the NET at the expense of a
     university, the cost of communication, and therefore the time required
     to transmit a file, are VERY important.

   If this sounds like a flame, then please assign my apparent bad attitude to
poor methodology rather than a desire to upset people.  This is provided in the
spirit of adding to what i hope will become a meaningful dialog with a very
practicle result.

mike@ists (Mike Clarkson) (05/04/88)

In article <696@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes:
> If you are (sigh) going to post binaries on Usenet, DO NOT compress
> them first.  Many Usenet sites use compress to pack up their news
> batches.  Compressing a compressed file makes it larger.

How about compressing a uuencoded compressed file.  Does that result in
a significantly larger than original file?

I would really like to see a uniform standard, with error checking, and
I think it is something worth the time it takes to do it.  We could
probably evolve the result to take care of another pet peeve of mine:
error correction in the tar format.  One thing I really miss from VMS is
the backup tape archiver, which has tremendous error checking and
correction.  In 7 years I have only ever had (touch wood) 1 tape go on
me, and that was because the oxide was falling off.  Having spent a good
part of today dealing with yet another dead Unix tar tape, I really wish
we could find a better way.

-- 
Mike Clarkson						mike@ists.UUCP
Institute for Space and Terrestrial Science		mike@ists.yorku.ca
York University, North York, Ontario,
CANADA M3J 1P3						(416) 736-5611

chasm@killer.UUCP (Charles Marslett) (05/04/88)

In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes:
> Just one thing that needs to be known -- PC's can do no more than 12-bit
> compression.  ...

Actually, I have sent several people copies of a minor mod to compress 4.0
that works fine if you have the memory (requires about 350-400 K above DOS
to do 16-bit compression).  The source assumes Turbo or Microsoft C for the
PC but it doesn't take up an immense amount of disk space either (about 40K
if I remember correctly).  I have also ported it to Atari STs, so that covers
some of the PC field.  Anyone want to merge these changes into the more
recent (4.1?) posting and perhaps make it work on Macs and Amigas?

Any good rule of thumb on how many requests imply a posting choice?

>   Bill

Charles Marslett
chasm@killer.UUCP
...!ihnp4!killer!chasm

jcs@tarkus.UUCP (John C. Sucilla) (05/04/88)

In article <55@psuhcx.psu.edu> wcf@psuhcx (Bill Fenner) writes:
>Just one thing that needs to be known -- PC's can do no more than 12-bit
>compression.  So if you are compressing your file from a UNIX system,
>you need to say comress -b12 filename .

Wrong! My 640K AT&T PC6300 has compress v4.0 running 16 bits on it
right now. -V shows the options at: MSDOS, XENIX_16 and BITS=16.

-- 
John "C" Sucilla
{ihnp4}!tarkus!jcs

Don't let reality stop you....

msf@amelia.nas.nasa.gov (Michael S. Fischbein) (05/04/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>  1) COMPRESS is a text only compression routine.  It will not now, or ever,
>     help in the compression of binary files.

This is absolutely untrue.  Compress works fine on binary files; I have seen
200 to 1 reductions on some bitmaps. Tarfiles compress readily, etc.

>  4) COMPRESS works faster and better on text files then the ARC routines
>     because they use 12 bit compression, where 13-bit (and more) are possible
>     under even the PC for COMPRESS (i've tried it on ans AT-clone).

16-bit compress is possible on IBM-PC's; it reportedly even runs under MSDOS.

Mention was made later in the original to uuencode; the atob and btoa programs
are more easier to use and are also freely available in source form.

	mike

-- 
Michael Fischbein                 msf@ames-nas.arpa
                                  ...!seismo!decuac!csmunix!icase!msf
These are my opinions and not necessarily official views of any
organization.

karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) (05/04/88)

     1) COMPRESS is a text only compression routine.  It will not now, or ever,
        help in the compression of binary files.

Nonsense.

[58] [8:33am] tut:/dino0/karl/bin/pyr/private> list enable
34831 -rwsr-x---  2 root     staff        8192 Apr 13 09:54 enable
[59] [8:33am] tut:/dino0/karl/bin/pyr/private> file enable
enable: 90x family demand paged pure executable
[60] [8:33am] tut:/dino0/karl/bin/pyr/private> compress -v < enable > enable.Z
Compression: 72.44%
[61] [8:33am] tut:/dino0/karl/bin/pyr/private> list enable.Z
35427 -rw-r--r--  1 karl     staff        2257 May  4 08:34 enable.Z
[62] [8:33am] tut:/dino0/karl/bin/pyr/private> 

--Karl

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/04/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
| 
|   1) COMPRESS is a text only compression routine.  It will not now, or ever,
|      help in the compression of binary files.

[ compress gives about 30% compression on binaries, depending on
  content. Whoever told you that it was for text only was completely
  wrong. ]
| 
|   2) ARITH is a more general compression routine using adaptive arithmetic 
|      coding.  It will compress binary files where there is redundancy, but
|      when it fails (on an extremely random file) the result increases very
|      little (under 1% in my experience).  It compresses better than HUFFMAN,
|      but it is NOT faster than SQ/UNSQ which are written in assembler whereas
|      ARITH is written in C.
|      (Once again, i will post it if there is sufficient interest.)

[ once again, do it, in source, so that others can test it themselves
  rather than relying on your opinion. ]
| 
|   3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
|      we are at the whims of whomever is currently supporting (or not supporting)
|      them.

[ the sources for zoo and arc have been posted several times to the net,
  and are available on a number of sites via ftp, uucp, and simple BBS
  download. ]

|   5) On the weak side, there is as yet, no CRC or checksum for any of these,
|      but adding it would be someithing i am willing to take responsibility
|      for should enough people decide they would like to take the approach
|      which i'm currently suggesting.

[ zoo and arc both have CRC. ]

|      Also, there no directory support provided with these tools.  They work
|      on only one file at a time.  This is also correctable since the source
|      is available.

[ arc works on multiple files in multiple directories, but doesn't
  preserve subdirectory information. zoo preserves the information unless
  told not to do it (an option). ]
| 
|   5) LASTLY: I am not trying to criticize the ARC routines, rather i am trying
|      [ deleted for brevity ]
|      Remember, for those of us who are NOT using the NET at the expense of a
|      university, the cost of communication, and therefore the time required
|      to transmit a file, are VERY important.

[ everyone would like faster transmissions, but not at the expense of
  using a non-standard format which people can't use. Sending info which
  is not useful is a *real* waste of bandwidth. ]
| 
|    If this sounds like a flame, then please assign my apparent bad attitude to
| poor methodology rather than a desire to upset people.  This is provided in the
| spirit of adding to what i hope will become a meaningful dialog with a very
| practicle result.  

The most charitable assumption I can make is that you are woefully
misinformed about the matters on which you speak. Please post this
"ARITH" routine to let others evaluate it, and read the responses to
your posting, many of which will probably not be even a polite as this
one.

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/04/88)

  I would like to add a little fuel to the fires of "which archiver"
discussion.  Use of the 'btoa' routine instead of uuencode would save
12% (!) on binary postings. This is a PD program, included in the
compress package, and runs just fine on a PC.

  All the discussion of using PKARC to save 1-2% or not using it to save
time for many of the people on the net seems pointless. We should use
both (standard) arc and zoo formats, uuencode them, and save bandwidth
by dropping this discussion. Hopefully Rahul will clarify this by edict.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

jpn@teddy.UUCP (John P. Nelson) (05/04/88)

>  1) COMPRESS is a text only compression routine.  It will not now, or ever,
>     help in the compression of binary files.

Whoa!  Where did THIS come from!?!?  It is simply not true!

It IS true that compress does a better job at compressing text files,
but this is because there is usually more redundency in text files than most
binary files (like executables).  Compress is simply MARVELOUS for
binary files like bit-mapped graphics, getting something like 90%
compression for many of them.

>  2) ARITH is a more general compression routine using adaptive arithmetic 
>     coding.  It will compress binary files where there is redundancy, but
>     when it fails (on an extremely random file) the result increases very
>     little (under 1% in my experience).  It compresses better than HUFFMAN,
>     but it is NOT faster than SQ/UNSQ which are written in assembler whereas
>     ARITH is written in C.
>     (Once again, i will post it if there is sufficient interest.)

Now we get some facts.  ARITH is HUFFMAN encoding. Compress is Lempel-Ziv
encoding.  Lempel-Ziv almost ALWAYS beats HUFFMAN (when there is a redundancy).
It is certainly possible that Lempel-ziv might expand random files more than
HUFFMAN, I haven't done any tests.

Older versions of ARC used to try both HUFFMAN and Lempel-Ziv, and use
the one that gave better compression.  The HUFFMAN support was dropped
(except for extracting from old archives), because Lempel-Ziv beat HUFFMAN
99% of the time!

>  3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
>     we are at the whims of whomever is currently supporting (or not supporting)
>     them.

MORE untruths.  The source for both ZOO and ARC are in C, and have been
distributed on USENET several times!  Some versions of the ARC source
included the extra code to handle the SQUASH compression algorithm
added by PKARC.

>  4) COMPRESS works faster and better on text files then the ARC routines
>     because they use 12 bit compression, where 13-bit (and more) are possible
>     under even the PC for COMPRESS (i've tried it on ans AT-clone).

PKARC's SQUASH is 13 bit compression.  Any more than this requires a
working buffer larger than 64K, which is why they are generally not used
very much on PCs.  The amount of additional compression between 13 bit
and 16 bit is no more than 2 or 3 percent!

Also, there is very little difference in speed between the 12 bit and
13 bit compression algorithms.  The major difference is in the memory
requirements.


>  5) On the weak side, there is as yet, no CRC or checksum for any of these,
>     but adding it would be someithing i am willing to take responsibility
>     for should enough people decide they would like to take the approach
>     which i'm currently suggesting.

This is the LEAST of the problems with using compress.

>     Also, there no directory support provided with these tools.  They work
>     on only one file at a time.  This is also correctable since the source
>     is available.

True, but why reinvent the wheel.  The source for the EXISTING programs is
ALSO available!

>   If this sounds like a flame, then please assign my apparent bad attitude to
>poor methodology rather than a desire to upset people.  This is provided in the
>spirit of adding to what i hope will become a meaningful dialog with a very
>practicle result.  

Your bad attitude appears to be due to an overdose of misinformation!
-- 
     john nelson

UUCP:            {decvax,mit-eddie}!genrad!teddy!jpn
ARPA (sort of):  talcott.harvard.edu!panda!teddy!jpn

dhesi@bsu-cs.UUCP (Rahul Dhesi) (05/04/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>  3) The source for ZOO, PKARC, and the others is NOT available.

The source for zoo 1.51 was posted to comp.sources.unix in the summer
of 1987.  The source for zoo 2.01 will be posted in the near future.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

bobmon@iuvax.cs.indiana.edu (RAMontante) (05/04/88)

cullsj.UUCP (Jeffrey C. Fried) writes, among other things:
,
, 1) COMPRESS is a text only compression routine.  It will not now, or ever,
,    help in the compression of binary files.

This statement made me shell out and run the following quick experiment:

	-rwxr-xr-x  1 bobmon      15360 Feb 27 01:22 pgen
	-rwxr-xr-x  1 bobmon      10116 May  4 08:46 pgen.Z

	-rwxr-xr-x  1 bobmon      14336 Feb 24 08:19 pom
	-rwxr-xr-x  1 bobmon       9945 May  4 08:47 pom.Z

Pgen and pom are both executable files (compiled from 'c').  Granted, this is
on a VAX machine, running the full-blown compress.  My attempts to run
compress on my 8088 box were frustrating, given its memory requirements, and
I haven't seen enough '.Z' formatted files to be worth the hassle.  But I
would assume that if it runs at all on a smaller machine, it will produce
the same results; unlike zoo and arc, it cannot choose one compression method
over another.

, 3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
,    we are at the whims of whomever is currently supporting (or not supporting)
,    them.

Source for arc is, at least for some Unix boxes.  Zoo source has been promised.
Pkarc was originally written in 8088 assembler, not the friendliest source.

, 4) COMPRESS works faster and better on text files then the ARC routines
,    because they use 12 bit compression, where 13-bit (and more) are possible
,    under even the PC for COMPRESS (i've tried it on ans AT-clone).

I haven't seen source for compress, either.  And the executables I've seen
were enormous, and limited to 12-bit LZW on 8088's under MSDOS; just like zoo
and arc (and pkarc's squash method is some sort of 13-bit LZW).  I've never
heard anyone claim responsibility for compress, while the authors of zoo,
pkarc, and arc are named, revered, vilified, and flamed frequently.  At least
one of them is an active participant on the Usenet.  (Plug:  I think that's
one strength of zoo, although Rahul might disagree :-)

, 5) On the weak side, there is as yet, no CRC or checksum for any of these,

Any of WHAT?  Zoo and arc certainly have a CRC value.  Compress is compress.
Its Unix-origin philosophy says that separate functions should be done by
separate routines with their outputs tied together by the operating system.
I think this is at the heart of some of the debates here.  The philosophy works
fine on a big multitasking machine like a VAX (or a suitably equipped 680x0
or '386?), and the entire news mailer system is predicated on that principle --
the mailer just calls compress (EVERYbody has compress, right?) to pack things
in for it; it doesn't worry about whether the result is correct, and neither
does compress.  It's up to you to aggregate your files with shar or something.

This piece-at-a-time philosophy is weaker on something like my MSDOS 8088 box.
There aren't multiple users all needing similar fundamental tools, there's
just me.  And I haven't the resources (memory or CPU cycles) to support lots
of little pieces that work fine individually but need sophisticated glue to
work together;  MSDOS's simulation of pipes is pathetic.  In such a situation
an integrated package (viz., zoo or arc) makes a lot more sense.  They can
incorporate in a consistent manner all those little pieces that a system admin.
may have put on a Unix box, but which I haven't yet found while rummaging
around BBS's.  By integrating everything a top-down design is possible,
unlike what happens when you bend the problem to fit the tools you already
have.

,    but adding it would be someithing i am willing to take responsibility
,    for should enough people decide they would like to take the approach
,    which i'm currently suggesting.

At which point it will become yet another uncommon non-standard (like ARITH?).
I don't think adding code will make it fit any better on small machines, and
the big machines can afford to calculate a CRC with an external routine.  Not
to mention the question of what you DO with it... Is the CRC for compress's
use?  Then it becomes not-quite-compress.  Is it for human use?  Then how do
I recreate it to find out if the file is still intact? ...

, 5) LASTLY: I am not trying to criticize the ARC routines, rather i am trying
,    to offer an alternative which i feel will reduce the time for transmission
,    of files, as well as, providing us with portability.  COMPRESS, ARITH, 
,    UNSHAR and UUENCODE are all available at the source level.  COMPRESS and
,    ARITH have been tried in at least three different environments: UNIX (BSD),
,    VMS and PC/MS-DOS.
,    Remember, for those of us who are NOT using the NET at the expense of a
,    university, the cost of communication, and therefore the time required
,    to transmit a file, are VERY important.

I don't find 1200bps transmission to be a lot of fun to wait for, either...
but I take it that your basic argument is that compress makes smaller archives
than zoo or arc, which are therefore cheaper to transmit.

I don't see that the compression improvement is as significant as you imply
(and your statement about binary is completely at odds with all my experience).
The other strengths of the integrated packages offer a LOT of functionality,
some of which I would seek out even if there were no compression involved.

The biggest problem I see is that many news mailers compress everything
blindly, so that an already-compressed file gets bigger.  This would also be
true of a sufficiently random file, although I think most executables aren't
that random.  And this compress-and-be-damned behavior is not a strength of
the system, it's a weakness.  (Even compress will complain if its result is
bigger than its original; does the mailer ignore this, or are the net.gods
lying when they claim they're shipping bigger files because of the double
compression?)

ralf@b.gp.cs.cmu.edu (Ralf Brown) (05/04/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
}  3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
}     we are at the whims of whomever is currently supporting (or not supporting)
}     them.
Sources are available for ZOO and ARC.



-- 
{harvard,uunet,ucbvax}!b.gp.cs.cmu.edu!ralf -=-=- AT&T: (412)268-3053 (school) 
ARPA: RALF@B.GP.CS.CMU.EDU |"Tolerance means excusing the mistakes others make.
FIDO: Ralf Brown at 129/31 | Tact means not noticing them." --Arthur Schnitzler
BITnet: RALF%B.GP.CS.CMU.EDU@CMUCCVMA -=-=- DISCLAIMER? I claimed something?

wtr@moss.ATT.COM (05/04/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>
>1) COMPRESS is a text only compression routine.  It will not now, or ever,
>   help in the compression of binary files.
>

[I'm not sure if this will be construed as a flame,
 but, asbestos suit in hand, here goes!]

WHAT ARE YOU TALKING ABOUT!?!?!?!?

I have assumed that everyone has been talking about the program
COMPRESS v4.0 that was posted to comp.sources.???? late last year
(let's not get too picky about the dates ;-).

It was based upon a "modified Lempel-Ziv algorithm" as 
published in IEEE Computer by Terry A. Welch.  PD source
was (at least in part) written by Joe Orost.
(appologies to anyone unintentionally left out of the credits)

With the full sixteen-bit compression, it does a great job of
compressing (almost ;-) all files, binary and source.  Most
compression ratios are in the 50-60% range, occasionally
as high as 75%. (larger files seem to compress a little better)

I have no idea what program you are referring to when you are 
describing your 'compress' but it is certainly not the same program
that I run on my AT clone at home.

=====================================================================
Bill Rankin
Bell Labs, Whippany NJ
(201) 386-4154 (cornet 232)

email address:		...![ ihnp4 ulysses cbosgd allegra ]!moss!wtr
			...![ ihnp4 cbosgd akgua watmath  ]!clyde!wtr
=====================================================================

jeff@cullsj.UUCP (Jeffrey C. Fried) (05/05/88)

   I stand corrected.  Since Lem-Ziv was DESIGNED for text compression, and
the authors do not mention its use for binaries, i never considered using it.
I tried it on an executable under UNIX and obtained a good reduction, for 
reasons which are not apparent.  I'm sure that there are cases where this does
not work (like graphics files), but it does appear to work , and in this case
better than the current version of ARITH.
   However, my point was that for TEXT, COMPRESS does a better job then the
ARC programs with which i'm familiar.  Also, i did not know that source for
zoo was available - a consideration which i believe to be VERY important
since support usually comes best from those who use a product.
   I would like to thank those who took the time to correct my 
missunderstanding concerning the use of compression on the net, but i find
it just a bit difficult because of the tone used in communicating with me.
For those who suggested that i "do my homework" before posting something
to the net, i can only say that since the net is my ONLY contact with this
problem, and that the comp...d group is for discussions, i am in essence
"doing my homework".  I'm sorry if my attempt to add to the discussion has
caused anyone to feel that their precious time has been wasted, but i 
think that you're as wrong as you are rude.
  
   Humbly yours,

Jeff Fried                                 ...!ames!cullsj!jeff
Cullinet Software
2860 Zanker Road, Suite 206                Reality, what a concept!
San Jose, CA, 95134

cudcv@daisy.warwick.ac.uk (Rob McMahon) (05/05/88)

In article <292@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>The following tools are 
>available in source code format:  COMPRESS (Lem-Ziv text compressor), ARITH
>(arithmetic compression for binary), UUencode/decode.  Since all of these
>will run under a variety of environments (IBM-PC, AMIGA, ATARI, VMS, SYS5,
>BSD), why not make these the basis for communicating.

I hope we're talking about binary files here, in which case I don't care
because I'd never just take a binary from the net and run it on one of my
machines.  If you're talking about sources, I like to scan down, read the
README, check out the comments in main etc., before I even save it to disk.
If I get all the bits of a posting, tack them together, uudecode them, and
uncompress them, only to find it's of no use to me, I'm not going to be
amused.  I have this feeling that people aren't going to bother to send proper
introductory articles in plain text before the actual posting.

Rob
-- 
UUCP:   ...!mcvax!ukc!warwick!cudcv	PHONE:  +44 203 523037
JANET:  cudcv@uk.ac.warwick.cu          ARPA:   cudcv@cu.warwick.ac.uk
Rob McMahon, Computing Services, Warwick University, Coventry CV4 7AL, England

tif@cpe.UUCP (05/05/88)

Written 10:45 am  May  2, 1988 by bbn.com!rsalz in cpe:comp.sources.d
>If you are (sigh) going to post binaries on Usenet, DO NOT compress
>them first.  Many Usenet sites use compress to pack up their news
>batches.  Compressing a compressed file makes it larger.

<< I don't post often and this about as close to a flame as I've come >>

But what you're compressing is text.  Text can always be compressed
with a significant advantage even if the text is a uuencoded compress
file.  In the end, the uuencode offsets most of the gains of the extra
compress.

If I were implementing a compressing file transfer utility which had
the possibity of transfering *binary* files, I would make sure that
the compress was actually profitable.

Since now adays most "news" transfers are compressed, I'll give in
that *if* "news" could transfer binary files, the compressing should
be left to the news transfer stuff rather than be done by the poster.

Since "news" can't handle binary files (at least nobody assumes
that it can), the file has to be encoded in some way.  I'll use
my kernel as a sample binary input file and uuencode for the
encoding technique.  I've included the results for my /usr/dict/words
file as well since postings sometimes intermix binary and ASCII files.

(I couldn't figure out what order these should be in)

(12 bit compresses only)		Net change of transfer size
transfer	posted			binary		ASCII
-------------------------------------------------------------------
uncompressed	normal			* no change	no change
compressed	normal			* -35%		-46%
uncompressed	uuencoded		+40%		+40%
uncompressed	uuencoded compress	-9%		-25%
compressed	uuencoded		-15%		-23%
compressed	uuencoded compress	-19%		-29%

(if you believe in 16 bit compresses)	Net change of transfer size
transfer	posted			binary		ASCII
-------------------------------------------------------------------
compressed	normal			* -43%		-49%
uncompressed	uuencoded compress	-20%		-28%
compressed	uuencoded		-28%		-30%
compressed	uuencoded compress	-31%		-35%

	* These can't be posted but are provided for reference

CONCLUSIONS:	To transfer binary files using news software the best
		method in all cases is to post a uuencoded compress file.

		When transfering ASCII files, if you compress and uuencode,
		not only are you robbing a 15-20% savings from the sites
		that use compressed transfers, but you should be shot for
		making it unreadable.

For the skeptics, here are the file sizes I used to build the tables:

16 bit compress		12 bit compress
234157			234157			xenix	
133183			152458			xenix.Z	
327860			327860			xenix.u	
167759			198043			xenix.u.Z
186498			213482			xenix.Z.u
161860			190137			xenix.Z.u.Z
194192			194192			words
99725			104093			words.Z
271908			271908			words.u
135065			150047			words.u.Z
139656			145772			words.Z.u
126459			137701			words.Z.u.Z

			Paul Chamberlain
			Computer Product Engineering, Tandy Corp.
			ihnp4!sys1!cpe!tif

geoff@utstat.uucp (Geoff Collyer) (05/05/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>  1) COMPRESS is a text only compression routine.  It will not now, or ever,
>     help in the compression of binary files.

This is absolutely dead wrong.  compress compresses any kind of file,
and has been used to compress (and correctly uncompress!), for example,
graphics bit maps, sendmail configuration files :-), and tar archives
containing binaries.
-- 
Geoff Collyer	utzoo!utstat!geoff, utstat.toronto.{edu,cdn}!geoff

jbuck@epimass.EPI.COM (Joe Buck) (05/05/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>
>  1) COMPRESS is a text only compression routine.  It will not now, or ever,
>     help in the compression of binary files.

Most emphatically wrong.  compress works just fine on many types of
binary files.  It can give 90% or more compression on bitmap data,
and usually > 50% compression on Unix executable files.  About the
only type of file I know of that compress fails on consistently is
floating point data in binary format.  As long some strings of bytes
occur much more frequently than others (whether they represent
characters, opcodes, or grey levels) compress kicks ass.

-- 
- Joe Buck  {uunet,ucbvax,sun,<smart-site>}!epimass.epi.com!jbuck
	    Old Internet mailers: jbuck%epimass.epi.com@uunet.uu.net

Argue for your limitations and you get to keep them.  -- Richard Bach

campbell@maynard.BSW.COM (Larry Campbell) (05/05/88)

In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
<>>Just one thing that needs to be known -- PC's can do no more than 12-bit
<>>compression.  So if you are compressing your file from a UNIX system,
<>>you need to say comress -b12 filename .
<>
<>This myth has been repeated several times, so I felt it was necessary to
<>speak up.  PCs most certainly CAN do a 16 bit compress/uncompress.  ...

Only a subset of PCs can do 16-bit compress/uncompress.  Mine can't.
I'm running VENIX/86 2.0, which is basically V7;  the PCC-derived
C compiler has only the tiny and small memory models (exactly
corresponding to non-split and split PDP-11s, which also cannot
handle 16-bit compress).

So it is true that PCs with a C compiler that supports multiple data
segments can handle 16-bit compress, but that hardly encompasses all
PCs in the world.
-- 
Larry Campbell                                The Boston Software Works, Inc.
Internet: campbell@maynard.bsw.com          120 Fulton Street, Boston MA 02109
uucp: {husc6,mirror,think}!maynard!campbell         +1 617 367 6846

greg@vertical.oz (Greg Bond) (05/05/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>  1) COMPRESS is a text only compression routine.  It will not now, or ever,
>     help in the compression of binary files.

No doubt others will point this out, but:
using compress V4.0, nethack 2.3 binary on a sun:

	Script started on Thu May  5 11:20:17 1988
	vertical% ls -l nethack
	-rwxr-xr-x  1 greg       761856 May  5 11:20 nethack*
	vertical% compress -v nethack
	nethack: Compression: 48.25% -- replaced with nethack.Z
	vertical% ls -l nethack.Z
	-rwxr-xr-x  1 greg       395121 May  5 11:20 nethack.Z*
	vertical% exit
	vertical% 
	script done on Thu May  5 11:21:37 1988

I would call that working. (Compress works on arbitary streams of 8-bit
bytes. It would be possible to write a version that only compressed 7-bit
text, and perhaps got better compression (say, 5%), but that is NOT the
version in the public domain).

Greg.
-- 
Gregory Bond,  Vertical Software, Melbourne, Australia
Internet: greg@vertical.oz.au	(or greg%vertical.oz.au@uunet.uu.net)
UUCP: {uunet,pyramid,mnetor,ukc,ucb-vision}!munnari!vertical.oz!greg
ACSnet: greg@vertical.oz

loci@csccat.UUCP (Chuck Brunow) (05/05/88)

	Let me point out one simple fact: source code is VERY MUCH 
	SMALLER than binaries.

les@chinet.UUCP (Leslie Mikesell) (05/05/88)

In article <25816@clyde.ATT.COM> wtr@moss.UUCP (Bill Rankin) writes:
>
>Personally, I use cpio & compress to move files.  I don't care 
>about execution time, rather transmission time is my most important

I like this also, but if an entire cpio archive is compressed, it
is impossible to (a) list the directory without a decompression pass
or (b) recover any part beyond a bit error in transmission.  Has
anyone condsidered a program which would leave the cpio headers
uncompressed but store the data as though each file had been individually
compressed (including adding the .Z to the name so extraction would be
possible with a normal cpio followed by uncompress)?  This would be
a nice thing to use for normal backups, especially if it followed the
normal compress rules of not trying to compress something that already
had the .Z extension. That still leaves the problem of compress needing
2 extra characters in the filename and DOS needing some other name convention
entirely...

  Les Mikesell

paul@devon.UUCP (Paul Sutcliffe Jr.) (05/05/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
+---------
|   1) COMPRESS is a text only compression routine.  It will not now, or ever,
|      help in the compression of binary files.
+---------

This is absolute and complete Bull-Ka-Ka.

    # cp /bin/sh /tmp
    # cd /tmp
    # ls -l sh
    -rwx--x--t   1 root     root       37762 May  5 09:23 sh
    # compress -V -v sh
    $Header: compress.c,v 4.0 85/07/30 12:50:00 joe Release $
    Options: BITS = 16
    sh: Compression: 34.90% -- replaced with sh.Z
    # ls -l sh.Z
    -rwx--x--t   1 root     root       24582 May  5 09:23 sh.Z

Looks like you can compress binaries to me!  Granted, the compression
factor isn't as good as can be had with text files (I've seen as much
as 90% in text files with plenty of repeating characters), but it
*does* work on binaries.

- paul

-- 
Paul Sutcliffe, Jr.				  +------------------------+
						  | Know what I hate most? |
UUCP (smart): paul@devon.UUCP			  |  Rhetorical questions. |
UUCP (dumb):  ...rutgers!bpa!vu-vlsi!devon!paul   +------<Henry Camp>------+

feg@clyde.ATT.COM (Forrest Gehrke) (05/05/88)

In article <10712@steinmetz.ge.com>, davidsen@steinmetz.ge.com (William E. Davidsen Jr) writes:
> 
>   I would like to add a little fuel to the fires of "which archiver"
> discussion.  Use of the 'btoa' routine instead of uuencode would save
> 12% (!) on binary postings. This is a PD program, included in the
> compress package, and runs just fine on a PC.

Having tried this sometime back, I have often wondered why this approach
is not used by USENET.  It would save a lot of transmission time.

>   All the discussion of using PKARC to save 1-2% or not using it to save
> time for many of the people on the net seems pointless. We should use
> both (standard) arc and zoo formats, uuencode them, and save bandwidth
> by dropping this discussion. Hopefully Rahul will clarify this by edict.

Also an excellent suggestion.  We could quickly find out from experience
which archiver works out best through use.

BTW what is holding up Rahul from taking over as moderator?

Forrest Gehrke k2bt

jbuck@epimass.EPI.COM (Joe Buck) (05/05/88)

In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>
>   I stand corrected.  Since Lem-Ziv was DESIGNED for text compression, and
>the authors do not mention its use for binaries, i never considered using it.
>I tried it on an executable under UNIX and obtained a good reduction, for 
>reasons which are not apparent.  I'm sure that there are cases where this does
>not work (like graphics files), but it does appear to work , and in this case
>better than the current version of ARITH.

Jeff, Jeff, Jeff.  You're STILL putting your foot in your mouth.  :-)

A Unix file is just a stream of bytes, and so is an MS-DOS file
except that it has extra attributes as well.  Compress replaces byte
strings with codes whose lengths are between 9 and 16 bits.  It will
work well on any file in which some byte sequences are more common
than others.  An executable file consists of instructions, which, for
almost all processors are integral numbers of bytes, and some are
much more common than others.  So compress works fine, and will give
good compression for just about any executable file.  There are
several types of graphics files: bitmaps are HIGHLY compressible;
other types of files act like a program for an imaginary computer and
consist of byte codes, some much more common than others.  These
compress well also.

There are only three types of files I've ever given to compress that
haven't been reduced in size as a result: random binary data,
floating point binary data, and files that have already been
compressed.

-- 
- Joe Buck  {uunet,ucbvax,sun,<smart-site>}!epimass.epi.com!jbuck
	    Old Internet mailers: jbuck%epimass.epi.com@uunet.uu.net

Argue for your limitations and you get to keep them.  -- Richard Bach

tneff@dasys1.UUCP (Tom Neff) (05/06/88)

In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
> ... Also, i did not know that source for
>zoo was available - a consideration which i believe to be VERY important
>since support usually comes best from those who use a product.

The source for ARC is available too, and it's running on (for instance)
this Stride.  

Don't confuse the ARC standard with Phil Katz's PC-optimized clone PKARC.
Due to an assiduous sales job most PC sysops have the Katz thing, but it
ain't the original.  The "C" language real McCoy is slower on PC's but
more portable.

>For those who suggested that i "do my homework" before posting something
>to the net, i can only say that since the net is my ONLY contact with this
>problem, and that the comp...d group is for discussions, i am in essence
>"doing my homework".  

There is a school of thought, notably expressed in the cat.announce.newusers
material, than the Net is a place for authoritative answers and requests for
same, not for "homework" owing to the expense of carrying it all.  I try to
keep an open mind.  :-)

Not that your posting was anything to apologize for anyway...

-- 
Tom Neff			UUCP: ...!cmcl2!phri!dasys1!tneff
	"None of your toys	CIS: 76556,2536		MCI: TNEFF
	 will function..."	GEnie: TOMNEFF		BIX: are you kidding?

brianc@cognos.uucp (Brian Campbell) (05/06/88)

In article <696@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
> If you are (sigh) going to post binaries on Usenet, DO NOT compress
> them first.  Many Usenet sites use compress to pack up their news
> batches.  Compressing a compressed file makes it larger.

Maybe those Usenet sites should not use the -f (force) flag with
compress.  Every version I've used (Sun, XENIX and DOS) will not
replace the original if the compressed version would be larger.  Try
compressing a file twice using the -v (verbose) option and see what
happens.
-- 
Brian Campbell        uucp: decvax!utzoo!dciem!nrcaer!cognos!brianc
Cognos Incorporated   mail: POB 9707, 3755 Riverside Drive, Ottawa, K1G 3Z4
(613) 738-1440        fido: (613) 731-2945 300/1200, sysop@1:163/8

laba-5ac@web7f.berkeley.edu (Erik Talvola) (05/06/88)

In article <1082@maynard.BSW.COM> campbell@maynard.UUCP (Larry Campbell) writes:
<>In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
<><>>Just one thing that needs to be known -- PC's can do no more than 12-bit
<><>>compression.  So if you are compressing your file from a UNIX system,
<><>>you need to say comress -b12 filename .
<><>
<><>This myth has been repeated several times, so I felt it was necessary to
<><>speak up.  PCs most certainly CAN do a 16 bit compress/uncompress.  ...
<>
<>Only a subset of PCs can do 16-bit compress/uncompress.  Mine can't.
<>I'm running VENIX/86 2.0, which is basically V7;  the PCC-derived
<>C compiler has only the tiny and small memory models (exactly
<>corresponding to non-split and split PDP-11s, which also cannot
<>handle 16-bit compress).
<>
<>So it is true that PCs with a C compiler that supports multiple data
<>segments can handle 16-bit compress, but that hardly encompasses all
<>PCs in the world.
<>-- 

What's wrong with getting a 16-bit Compress executable file for the PC
which was compiled with a proper C compiler?  Then, you can run a 16-bit
compress on any PC.  You are right in that you may not be able to compile
it with all C compilers, but you can run the executable on any PC (as long
as you have ~500K free).

>Larry Campbell                                The Boston Software Works, Inc.
>Internet: campbell@maynard.bsw.com          120 Fulton Street, Boston MA 02109
>uucp: {husc6,mirror,think}!maynard!campbell         +1 617 367 6846

---------------------------------------------------
Erik Talvola                 erikt@zen.berkeley.edu

"...death is an acquired trait." -- Woody Allen
---------------------------------------------------

caf@omen.UUCP (Chuck Forsberg WA7KGX) (05/06/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
:
:  1) COMPRESS is a text only compression routine.  It will not now, or ever,
:     help in the compression of binary files.
The 13 bit compression in zoo gets about 29% compresseing YAM.EXE.
:  2) ARITH is a more general compression routine using adaptive arithmetic 
:     coding.  It will compress binary files where there is redundancy, but
Please post it!
:  3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
:     we are at the whims of whomever is currently supporting (or not supporting)
:     them.
The sources to ZOO *are* available, in fact it was a copy of ZOO I compiled
for 386 Xenix that I used in the above micro-benchmark.
:  4) COMPRESS works faster and better on text files then the ARC routines
:     because they use 12 bit compression, where 13-bit (and more) are possible
:     under even the PC for COMPRESS (i've tried it on ans AT-clone).
Compress, ARC, PKARC, and ZOO all use forms of LZW compression, derived
from the original Unix compress program.
:  5) On the weak side, there is as yet, no CRC or checksum for any of these,
:     but adding it would be someithing i am willing to take responsibility
:     for should enough people decide they would like to take the approach
:     which i'm currently suggesting.
The lack of a CRC in compress is a serious weakness.  ZRC and ZOO include CRC.
:     Also, there no directory support provided with these tools.  They work
:     on only one file at a time.  This is also correctable since the source
:     is available.
ZOO has excellent directory support - full Unix pathnames are supported.

Again, please post the ARITH program.  It would be most interesting
if the memory requirements are small - like Huffman encoding instead
of LZW.
Newsgroups: comp.sources.d,comp.binaries.ibm.pc.d
Subject: Re: Standard for file transmission
Summary: 
Expires: 
References: <292@cullsj.UUCP> <55@psuhcx.psu.edu> <537@csccat.UUCP> <I> <would> <like> <to> <clear> <up> <a> <couple> <of> <notions> <that> <have> <been> <expressed> <over> <296@cullsj.UUCP>
Sender: 
Reply-To: caf@omen.UUCP (Chuck Forsberg WA7KGX)
Followup-To: 
Distribution: 
Organization: Omen Technology Inc, Portland Oregon
Keywords: protocol compression source

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
:
:  1) COMPRESS is a text only compression routine.  It will not now, or ever,
:     help in the compression of binary files.
The 13 bit compression in zoo gets about 29% compresseing YAM.EXE.
:  2) ARITH is a more general compression routine using adaptive arithmetic 
:     coding.  It will compress binary files where there is redundancy, but
Please post it!
:  3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
:     we are at the whims of whomever is currently supporting (or not supporting)
:     them.
The sources to ZOO *are* available, in fact it was a copy of ZOO I compiled
for 386 Xenix that I used in the above micro-benchmark.
:  4) COMPRESS works faster and better on text files then the ARC routines
:     because they use 12 bit compression, where 13-bit (and more) are possible
:     under even the PC for COMPRESS (i've tried it on ans AT-clone).
Compress, ARC, PKARC, and ZOO all use forms of LZW compression, derived
from the original Unix compress program.
:  5) On the weak side, there is as yet, no CRC or checksum for any of these,
:     but adding it would be someithing i am willing to take responsibility
:     for should enough people decide they would like to take the approach
:     which i'm currently suggesting.
The lack of a CRC in compress is a serious weakness.  ZRC and ZOO include CRC.
:     Also, there no directory support provided with these tools.  They work
:     on only one file at a time.  This is also correctable since the source
:     is available.
ZOO has excellent directory support - full Unix pathnames are supported.

Again, please post the ARITH program.  It would be most interesting
if the memory requirements are small - like Huffman encoding instead
of LZW.

mark@adec23.UUCP (Mark Salyzyn) (05/06/88)

I'm sorry, I don't care if the IBM-PC can handle better than 12 bit
compress! I run UNIX on a PDP 11/23 *NON SPLIT I/D MACHINE* and that allows
me to use 12 bit compress (however I have a 13 bit LZW pack routine
that was posted in 1983 that works fine). In order to read stuff that
is packed more than 12 bit LZW I had to rewrite compress to use disk
rather than memory. BOY IS IT SLOW. In the interest of compatibility
with ALL types of machines I suggest that we use 12 bit compress. This
is the most available compression bit selection. If not, then I am
going to extend my disk version to handle 17 bit compress, post something
usefull and watch you all squirm.

			G'day
-- Mark Salyzyn, mad at the world for advancing and leaving me behind

ephram@violet.berkeley.edu (05/06/88)

In article <1082@maynard.BSW.COM> campbell@maynard.UUCP (Larry Campbell) writes:
>In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
><>>Just one thing that needs to be known -- PC's can do no more than 12-bit
><>>compression.  So if you are compressing your file from a UNIX system,
><>>you need to say comress -b12 filename .
><>
><>This myth has been repeated several times, so I felt it was necessary to
><>speak up.  PCs most certainly CAN do a 16 bit compress/uncompress.  ...
>
>Only a subset of PCs can do 16-bit compress/uncompress.  

Hasn't anyone ever heard of a disk drive?!?  multiple segments as a limitation?
How about writing temporary results to a disk file (random access)? RAM disk?

Now I must admitt I have never cracked open the code to compress/uncompress,
but it seems to me that using a disk drive as an intermediate result area is
a very viable work around.  I would rather sit and watch my disk spin for an
extra minute than watch the RD light on my modem work 10% more time.

I admit it is not elegant but when some one says "can not do" I must speak
up.

Ephram Cohen
ephram@violet.berkeley.edu

gnu@hoptoad.uucp (John Gilmore) (05/06/88)

Has the first virus been transmitted by Usenet yet?  Just think, 8100
readers of comp.binaries.ibm.pc will all be infected at once!

I don't accept or transmit the binaries newsgroups (on hoptoad), so maybe
I don't get to vote; but I think that rather than figuring out fancy ways
to pass binaries around, we should just remove them from the Usenet.

People who want binaries can start their own alternative network (bin.xxx?)
and waste their own bandwidth and eyewidth.

People who want to display their ignorance about compress should do so
to a bartender somewhere, not to the net.
-- 
John Gilmore  {sun,pacbell,uunet,pyramid,ihnp4}!hoptoad!gnu        gnu@toad.com
"Use the Source, Luke...."

wtr@moss.ATT.COM (05/06/88)

In article <552@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes:
>
>	Let me point out one simple fact: source code is VERY MUCH 
>	SMALLER than binaries.

Well, here we go again!

1290 -rw-r--r--   1 xxx      xxxxx     654848 Apr 24 20:29 ksh.cpio
 225 -rwxr-xr-x   1 xxx      xxxxx     113607 Apr 25 17:33 ksh*

well, folks, it seems that the distribution source (uncompressed)
is MUCH LARGER than the corresponding binary.

[I'm going to make some crude, probamatically wrong in xx% of all 
cases, observations.  Any attempt to find factual evidence below
this line, and i'll sue for 'look and feel' ;-]

It *SEEMS* (to me) that in smaller programs the binaries tend to be
larger than the original source, i believe because of the overhead
of the system code in proportion to the base source.  On larger 
programs, this ratio is much lower, and thus the source tends to
be larger than the corresponding executable.

[ ---> insert flames here <--- ]
[ and don't forget CYA! ]

=====================================================================
Bill Rankin
Bell Labs, Whippany NJ
(201) 386-4154 (cornet 232)

email address:		...![ ihnp4 ulysses cbosgd allegra ]!moss!wtr
			...![ ihnp4 cbosgd akgua watmath  ]!clyde!wtr
=====================================================================

jejones@mcrware.UUCP (James Jones) (05/06/88)

In article <8430@iuvax.cs.indiana.edu>, bobmon@iuvax.cs.indiana.edu (RAMontante) writes:
> [Compress's] Unix-origin philosophy says that separate functions should be
> done by separate routines with their outputs tied together by the operating
> system....The philosophy works fine on a big multitasking machine like a VAX
> (or a suitably equipped 680x0 or '386?)...
> This piece-at-a-time philosophy is weaker on something like my MSDOS 8088 box.
> ...In such a situation an integrated package (viz., zoo or arc) makes a lot
> more sense.

Oddly enough, I would make just the opposite argument, though it is, like Mr.
Montante's, influenced by the particulars of my home system, which has memory
limitations too.  A monolithic glob like ARC, which includes every compres-
sion method known to man (well, arithmetic compression evidently hasn't made
it in yet, and PKARC has LZ with a knob tweaked), won't fit on my 6809, which
limits me to 64K (code + data) per process under OS-9/6809 Level Two.  If those
compression programs were written separately, they could individually fit and
could be invoked by a process that knows the surrounding header bilge and
calls up the appropriate compress/decompress program.  (The software tools
philosophy has advantages for programs as well as users.)

		James Jones

jpn@teddy.UUCP (John P. Nelson) (05/06/88)

>	Let me point out one simple fact: source code is VERY MUCH 
>	SMALLER than binaries.

This is not clear.

For small programs in a high-level compiled language (like C), this is
true:  This is because the small program pulls in the language run-time
library.  The source is much smaller than the resulting executable:
However, I would bet that the object file (before linking) would be
about the same size as the source (even WITH the symbol table and
relocation information).

Assembly language source usually run about 10 times larger than the
resulting executable.

Large C program (64k+) source usually runs two to three times larger
than the resulting executable.  Of course, I find source code more
valuable:  I can make changes to suit my environment, or I can port the
program to a different machine entirely.  And of course, with an
operating system like UNIX which runs on a plethora of machines,
source code is the only acceptable distribution mechanism.

Other languages have different source/binary size ratios.  Some
languages can generate a lot of code with a very small amount of
source.  However, most of the source code posted to USENET is C.
-- 
     john nelson

UUCP:            {decvax,mit-eddie}!genrad!teddy!jpn
ARPA (sort of):  talcott.harvard.edu!panda!teddy!jpn

dhesi@bsu-cs.UUCP (Rahul Dhesi) (05/06/88)

In article <4521@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>I think that rather than figuring out fancy ways
>to pass binaries around, we should just remove them from the Usenet.

Look at the alternative:  To be able to use sources on most
microcomputers, you would probably have to have about five different C
compilers, two or three assemblers, a Pascal compiler or two, and at
least 10 megabytes of hard disk space for the big ones.  Realize that
no current microcomputer operating system on the market costing less
than $300 comes bundled with a decent language translator.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

rroot@edm.UUCP (uucp) (05/07/88)

From article <3980@killer.UUCP>, by chasm@killer.UUCP (Charles Marslett):
> In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes:
>> Just one thing that needs to be known -- PC's can do no more than 12-bit
>> compression.  ...
> Actually, I have sent several people copies of a minor mod to compress 4.0
> that works fine if you have the memory (requires about 350-400 K above DOS
There are still, however, people running on systems who'se compilers don't know
how to work with >64K.
These systems exist and have to be dealt with.

-- 
-------------
 Stephen Samuel 			Disclaimer: You betcha!
  {ihnp4,ubc-vision,seismo!mnetor,vax135}!alberta!edm!steve
  BITNET: USERZXCV@UQV-MTS

loci@csccat.UUCP (Chuck Brunow) (05/07/88)

In article <2096@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes:
>In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>>
>>   I stand corrected.  Since Lem-Ziv was DESIGNED for text compression, and
>>the authors do not mention its use for binaries, i never considered using it.
>>I tried it on an executable under UNIX and obtained a good reduction, for 
>>reasons which are not apparent.  I'm sure that there are cases where this does

	This is actually partially true. The first "compress" to appear
	on the net (several years ago) only worked on text files and
	dumped core on binary files. The reason you get good compressions
	on binary files is probably that they haven't been stripped of
	the relocation info. Strip them first and I doubt that the
	compression will be so good (otherwise, throw your optimizer
	into the bit bucket). Typical (large) text compression is about
	67%, whereas binaries are closer to 20%. (I use 16-bit compress).

>
>A Unix file is just a stream of bytes, and so is an MS-DOS file
>except that it has extra attributes as well.  Compress replaces byte
>strings with codes whose lengths are between 9 and 16 bits.  It will
>work well on any file in which some byte sequences are more common
>than others.  An executable file consists of instructions, which, for
>almost all processors are integral numbers of bytes, and some are
>much more common than others.  So compress works fine, and will give
>good compression for just about any executable file.  There are

	This is doubtful. There's a good description of the workings
	of LZW in the GIF docs (recently posted). Bytes aren't the
	key feature here, but rather sequences of repeated bytes
	which should be rare in an optimized executable (on Unix
	at least).

>several types of graphics files: bitmaps are HIGHLY compressible;

	If they have lots of blank space, or other repeated sequences.
	Otherwise, they can be very similar to executables: 10-20%.

>other types of files act like a program for an imaginary computer and
>consist of byte codes, some much more common than others.  These
>compress well also.

	You must mean Huffman coding. These comments are true in that
	case, not LZW.
>
>There are only three types of files I've ever given to compress that
>haven't been reduced in size as a result: random binary data,
>floating point binary data, and files that have already been
>compressed.
>
	The point being that there is little redundancy.
>--

wyle@solaris.UUCP (Mitchell Wyle) (05/07/88)

This discussion will bear fruit only if r$ or the backbone gurus
implement one of these schemes as a usenet standard, and distribute
sources or binaries packaged with tarmail or whichever scheme wins
this debate.   I vote for tarmail.  Let's get a standard accepted!
-- 
-Mitchell F. Wyle            wyle@ethz.uucp
Institut fuer Informatik     wyle%ifi.ethz.ch@relay.cs.net
ETH Zentrum                  
8092 Zuerich, Switzerland    +41 1 256-5237

mike@ists (Mike Clarkson) (05/08/88)

In article <4521@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
> I don't accept or transmit the binaries newsgroups (on hoptoad), so maybe
> I don't get to vote; but I think that rather than figuring out fancy ways
> to pass binaries around, we should just remove them from the Usenet.

I know that many in the amiga, mac and PC worlds would scream, but it
might force a really postive change in those worlds: good C compilers.
Maybe it would help get good ANSI C compilers written and sold.

> "Use the Source, Luke...."

hysterical !
-- 
Mike Clarkson					mike@ists.UUCP
Institute for Space and Terrestrial Science	mike@ists.yorku.ca
York University, North York, Ontario,		uunet!mnetor!yunexus!ists!mike
CANADA M3J 1P3					+1 (416) 736-5611

ford@elgar.UUCP (Ford Prefect ) (05/08/88)

In article <9644@agate.BERKELEY.EDU> laba-5ac@web7f.berkeley.edu.UUCP (Erik Talvola) writes:
>What's wrong with getting a 16-bit Compress executable file for the PC
>which was compiled with a proper C compiler?  Then, you can run a 16-bit
>compress on any PC.  You are right in that you may not be able to compile
>it with all C compilers, but you can run the executable on any PC (as long
>as you have ~500K free).

There are a few problems with this approach:

1)	Such a compiler has to exist for the operating system you are
	running.  Obviously, the author had his brain in Ms.Dos mode,
	which, since the article was cross-posted to
	comp.binaries.ibm-pc, is forgivable in this case.  But one of
	the articles that was being followed up to mentioned an O.S.
	that only supported 64k segments.  Compress just won't work
	in such an environment without major redisign (like keeping
	the arrays in a disk file :-).

2)	The executable you get must be for your CPU!  This is obvious,
	of course, but I keep detecting a definite ibm-pc-chauvanist
	state of mind in this discussion.  Don't forget that there
	are people who are still running unix on PDP-11's and proud
	of it!  The PDP-11 is very similar to the 8086 except that
	nobody does anything as kludgey as geferkin with the segment
	registers!  So the best you can get is 64k code, 64k data.

In other words, discussion of a standardized compression format must
take into account the existence of small machines.  And "PC" !=
"Intel Cpu".

Personally, I use 16-bit compress since I don't need to talk to such
small machines.  But if I need to post a binary to the net, I will
probably use 12-bit compress, because I've never heard of a machine
or compiler that couldn't run it.

					-=] Ford [=-

"Once there were parking lots,		(In Real Life:  Mike Ditto)
now it's a peaceful oasis.		ford%kenobi@crash.CTS.COM
This was a Pizza Hut,			...!sdcsvax!crash!kenobi!ford
now it's all covered with daisies." -- Talking Heads

wnp@dcs.UUCP (Wolf N. Paul) (05/08/88)

In article <5098@chinet.UUCP> les@chinet.UUCP (Leslie Mikesell) writes:
 >In article <25816@clyde.ATT.COM> wtr@moss.UUCP (Bill Rankin) writes:
 >>Personally, I use cpio & compress to move files.  I don't care 
 >>about execution time, rather transmission time is my most important
 >I like this also, but if an entire cpio archive is compressed, it
 >is impossible to (a) list the directory without a decompression pass
 >or (b) recover any part beyond a bit error in transmission.  Has
 >anyone condsidered a program which would leave the cpio headers
 >uncompressed but store the data as though each file had been individually
 >compressed (including adding the .Z to the name so extraction would be
 >possible with a normal cpio followed by uncompress)?  This would be
 >a nice thing to use for normal backups, especially if it followed the
 >normal compress rules of not trying to compress something that already
 >had the .Z extension. That still leaves the problem of compress needing
 >2 extra characters in the filename and DOS needing some other name convention
 >entirely...


 Well, the sources for a cpio-compatible archiver are available from sites
 which archive comp.sources.unix. This archiver is called AFIO.

 Someone out there volunteering to add the code to do compression as suggested
 by Leslie? I don't think I'm qualified or I'd attempt it.
-- 
Wolf N. Paul * 3387 Sam Rayburn Run * Carrollton TX 75007 * (214) 306-9101
UUCP:     ihnp4!killer!dcs!wnp                 ESL: 62832882
INTERNET: wnp@DESEES.DAS.NET or wnp@dcs.UUCP   TLX: 910-280-0585 EES PLANO UD

leonard@bucket.UUCP (Leonard Erickson) (05/08/88)

In article <1082@maynard.BSW.COM> campbell@maynard.UUCP (Larry Campbell) writes:
<Only a subset of PCs can do 16-bit compress/uncompress.  Mine can't.
<I'm running VENIX/86 2.0, which is basically V7;  the PCC-derived
<C compiler has only the tiny and small memory models (exactly
<corresponding to non-split and split PDP-11s, which also cannot
<handle 16-bit compress).
<
<So it is true that PCs with a C compiler that supports multiple data
<segments can handle 16-bit compress, but that hardly encompasses all
<PCs in the world.

Larry, you are confusing being able to *compile* a program and being able
to *use* it! I don't have *any* kind of C compiler. But I can uncompress
stuff that was compressed on a Unix system on my PC. 

Some kind soul posted an msdos *binary* for compress a while back. All
you need is DOS and more than 512k of ram...

True, this places two limits on the people who are using the program:
1. they've got to be using MS-DOS. (since we are talking about comp.-
   binaries.ibm.pc any arguments that this is a serious restriction
   should be routed to /dev/null)
2. they have to have 640k (576 will probably work, but I haven't
   tried it). This *is* a problem, but even at current memory prices
   it isn't *too* serious. (Unless you have an AT whose memory is mapped
   as 512 dos/512 extended)

-- 
Leonard Erickson		...!tektronix!reed!percival!bucket!leonard
CIS: [70465,203]
"I used to be a hacker. Now I'm a 'microcomputer specialist'.
You know... I'd rather be a hacker."

jw@pan.UUCP (Jamie Watson) (05/09/88)

In article <4521@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>Has the first virus been transmitted by Usenet yet?  Just think, 8100
>readers of comp.binaries.ibm.pc will all be infected at once!

And every one of them deserves PRECISELY what they get...

>I don't accept or transmit the binaries newsgroups (on hoptoad), so maybe
>I don't get to vote;

Me either, but I don't appreciate having to clutter my sys file, and that
of my news feed, just to explicitly exclude them.

> but I think that rather than figuring out fancy ways
>to pass binaries around, we should just remove them from the Usenet.

This is an excellent idea.  I second the motion.

>People who want binaries can start their own alternative network (bin.xxx?)
>and waste their own bandwidth and eyewidth.

AND MONEY, in some cases.

jw

wayneck@tekig5.TEK.COM (Wayne Knapp) (05/09/88)

In article <552@csccat.UUCP>, loci@csccat.UUCP (Chuck Brunow) writes:
> 
> 	Let me point out one simple fact: source code is VERY MUCH 
> 	SMALLER than binaries.

Have you ever done any real programing?  Sure if your souce code is only 20
lines long the binary is smaller.  However I'm used to seeing ratios like
500k of source code to 160k of binaries.  Or something I have at home 
7 880k disks full of souce code to 1 disk of binaries.  From what I've seen,
the longer the program the greater the ratio of source code to binaries.

                                             Wayne Knapp

dg@lakart.UUCP (David Goodenough) (05/09/88)

From article <4521@hoptoad.uucp>, by gnu@hoptoad.uucp (John Gilmore):
> Has the first virus been transmitted by Usenet yet?  Just think, 8100
> readers of comp.binaries.ibm.pc will all be infected at once!

This is a very valid objection to the transmission of binaries on the
net. I once figured out a virus to infect CP/M, and I know it can be
ported to MS-DOS: the real beauty of it was that it would not only
eat hard disks, but floppies as well (My plan was to install it in
a software package that I was thinking of selling, to discourage illegal
copies).

As it resided in a little under 1/2 K of binary, it was very inoccuous,
until it showed. But when it did .... ALL data, directory and system tracks
on a disk just vanished, and the way I did it, not even the Norton utilities
(or the CP/M equivalent) could bring back the files.

Mercifully I have never put this out, but as John Gilmore says, the notion of
such a beast running round on usenet gives me the screaming horrors.

I hear objections to use of C for posting source, but I have never found it
a problem: I regularly take small & medium sized C sources from UNIX, and
port them to my CP/M machine at home. It's not that difficult to do: of course
it's going to object if I try to port hack :-) :-), but ONLY for size reasons:
I can get each of the separate source files to compile, I just can't link them.

So lets see more source, and leave the binaries for those poor trusting souls
that don't know about the real world. Call me a cynic, but after some of
the warnings I've seen on a local BBS, sooner or later the axe is going
to fall.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
							| +-+-+
	....... !harvard!adelie!cfisun!lakart!dg	+-+-+ |
						  	  +---+

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/10/88)

Since the discussion is on I'm posting atob and btoa to binaries.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/10/88)

In article <552@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes:
>
>	Let me point out one simple fact: source code is VERY MUCH 
>	SMALLER than binaries.

And another: not everybody has all compilers.  There have been postings
in MSC, Turbo C, Turbo Pascal, MASM, Fortran and COBOL (yecch) so far on
this group.  That's why we have a binary group.  Besides I wouldn't give
out source to some things which I can distribute as binary. 

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

jeff@cullsj.UUCP (Jeffrey C. Fried) (05/10/88)

   I recently acquired the ZOO executables from the net and found them to be
incompatible with ARC.  The UNIX ARC i received over the net is compatible
with ARC5.2.1 under DOS.  Has anyone else experienced this incompatibility?

jtara@m2-net.UUCP (Jon Tara) (05/10/88)

In article <3980@killer.UUCP>, chasm@killer.UUCP (Charles Marslett) writes:
> In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes:
> > Just one thing that needs to be known -- PC's can do no more than 12-bit
> > compression.  ...
> 
> Actually, I have sent several people copies of a minor mod to compress 4.0

Funny, I ran compress 4.0 through the Microsoft 4.0 compiler using
large model, and I've been happily compressing and de-compressing
with 16 bits ever since.  Far as I can tell, it doesn't need any
changes, at least under MS/PC-DOS and Microsoft C.  It does need
a good chunk of memory, which most people should have, unless you're
a real TSR nut.

-- 
  jtara%m-net@umix.cc.umich.edu          ihnp4!dwon!m-net!jtara

 "You don't have to take this crap.  You don't have to sit back
  and relax." _Walls Come Tumbling Down_, The Style Council

rick@pcrat.UUCP (Rick Richardson) (05/10/88)

In article <679@omen.UUCP> caf@omen.UUCP (Chuck Forsberg WA7KGX) writes:
>Again, please post the ARITH program.  It would be most interesting
>if the memory requirements are small - like Huffman encoding instead
>of LZW.

In case ARITH never gets posted:  the complete article and program appeared
in ACM last year, in C.  I typed it in myself (and lost it later).  The
program, as published, runs a lot slower than compress and does not do
quite as good a job as compress.  It was better than "pack".  It is
very small, and uses little memory.

If you dig into the article (this from memory, I seem to have misplaced
the issue of ACM as well), the program separates the algorithm for
encoding into a model.   Two models are presented, one that just
uses a static letter frequency table (for text), and an adaptive model (for
binaries).  As I recall, the author pointed out that more sophisticated
adaptive algorithms could be used for better results.

After monkeying around with the program for an evening, and even trying
my own hand at a more sophisticated model, I shelved the program, with
nary a backup.  Since it was slower and less efficient than compress,
I think its usefullness is limited to those applications which are
sensitive to both program and data size, such as in a modem.

BTW, I heard some rumor that a 16 bit "uncompress"-only is available
for limited memory systems.  If this is true, then why all the fuss
about 16 bit compression?
-- 
		Rick Richardson, President, PC Research, Inc.

(201) 542-3734 (voice, nights)   OR     (201) 834-1378 (voice, days)
uunet!pcrat!rick (UUCP)			rick%pcrat.uucp@uunet.uu.net (INTERNET)

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/10/88)

In article <307@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:

|    I recently acquired the ZOO executables from the net and found them to be
| incompatible with ARC.

  Correct. zoo is not "another arc file program," it is a totally
separate file structure, containing information which neither arc or
pkarc include.

|               The UNIX ARC i received over the net is compatible
| with ARC5.2.1 under DOS.  Has anyone else experienced this incompatibility?  

  Alas, there is no "the" UNIX arc, there are a number of slightly
diferent versions. If you have the one I suspect, it needs the "-i"
option to be compatible with the DOS arc. I highly commend switching the
meaning of that flag for default DOS compatibility.

  Actually I highly commend using zoo...
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

tneff@dasys1.UUCP (Tom Neff) (05/10/88)

In article <307@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>   I recently acquired the ZOO executables from the net and found them to be
>incompatible with ARC.  The UNIX ARC i received over the net is compatible
>with ARC5.2.1 under DOS.  Has anyone else experienced this incompatibility?  

Yes, everyone has experienced this incompatibility Jeffrey, because
they are not SUPPOSED to be compatible!  :-)

ARC is one archiving standard, ZOO is a completely different standard.
You need one set of programs to create, list and extract ARC files, and
a different set to manipulate ZOO archives.  You can't use one with the
other.

Now, if your next question was going to be why there are two incompatible
archiving standards for the MSDOS/UNIX/VMS environment, you'll have to
ask our very own moderator Rahul, because there was only one (ARC) until
he decided to invent his own.  I told him at the time that user confusion
would result, but the argument is moot at this point.

-- 
Tom Neff			UUCP: ...!cmcl2!phri!dasys1!tneff
	"None of your toys	CIS: 76556,2536		MCI: TNEFF
	 will function..."	GEnie: TOMNEFF		BIX: are you kidding?

campbell@maynard.BSW.COM (Larry Campbell) (05/11/88)

In article <2894@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
<>In article <4521@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
<>>I think that rather than figuring out fancy ways
<>>to pass binaries around, we should just remove them from the Usenet.

Hear, hear!

<>Look at the alternative:  To be able to use sources on most
<>microcomputers, you would probably have to have about five different C
<>compilers, two or three assemblers, a Pascal compiler or two, and at
<>least 10 megabytes of hard disk space for the big ones.  Realize that
<>no current microcomputer operating system on the market costing less
<>than $300 comes bundled with a decent language translator.
<>-- 
<>Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

Nope.  All you need is Turbo-C (about $60 retail) and Turbo Pascal (a bit
less, I think).  Probably about $100 total.  Almost no one writes or posts
in assembler these days, but I think there are inexpensive assemblers
floating around.

I can't understand someone spending thousands of dollars on PC hardware,
hundreds of dollars on modems and telephone charges, and then balking at
shelling out 60 bucks for an _excellent_ C compiler!
-- 
Larry Campbell                                The Boston Software Works, Inc.
Internet: campbell@maynard.bsw.com          120 Fulton Street, Boston MA 02109
uucp: {husc6,mirror,think}!maynard!campbell         +1 617 367 6846

pjh@mccc.UUCP (Pete Holsberg) (05/11/88)

In article <10770@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
...  Alas, there is no "the" UNIX arc, there are a number of slightly
...diferent versions. If you have the one I suspect, it needs the "-i"
...option to be compatible with the DOS arc. I highly commend switching the
                                                              ^^^^^^^^^^^^^
...meaning of that flag for default DOS compatibility.
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   
How do you do that?  Thanks.

mitch@stride1.UUCP (Thomas P. Mitchell) (05/12/88)

In article <10758@steinmetz.ge.com> davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes:
>In article <552@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes:
>>
>
>And another: not everybody has all compilers.  There have been postings
>in MSC, Turbo C, Turbo Pascal, MASM, Fortran and COBOL (yecch) so far on
>this group.  That's why we have a binary group.  Besides I wouldn't give
>out source to some things which I can distribute as binary. 

To send binary or not to send binary, this is a tough question.
But my general thoughts on this is that binarys are the way to
distribute a product you sell and support.  Source text is the
way to make available something you wish to share and are willing
to see improved and expanded.  Oh yes critized as well.

The argument that not evrybody has all compilers is real.  Yet I
dislike it.  To me compilers are like a key board,  a computer is
worth little without one.   

Back to the topic. Not all binary files are code so how do we
transfer binarys or anything else for that matter.  Some things like
bit maps (face server, fonts) and other data needs to be transfered from
machine to machine.  And at times code binarys as well (blush I
did say this).  My thought on this is that the link should know
how to best send the data.  In other words uucp should be
expanded to exchange abilities.  Consider an initial uucp
conection in which the programs exchange information like "have
compress16|compress12, have bta/atb, have kermit, have xmodem ,
have link_is_100%, have TeleBit, have never_talked, have
exchanged_compression_tables".  Given this type of information
the program then can select the best tool to get the best
effective transfer rate for the next conversation.

Well what say you all?

Thanks for the soap.
mitch@stride1.Stride.COM

Thomas P. Mitchell (mitch@stride1.Stride.COM)
Phone: (702)322-6868	TWX: 910-395-6073	FAX: (702)322-7975
MicroSage Computer Systems Inc.
Opinions expressed are probably mine.

dhesi@bsu-cs.UUCP (Rahul Dhesi) (05/12/88)

In article <1083@maynard.BSW.COM> campbell@maynard.UUCP (Larry Campbell) writes:
[justifying the claim that source postings are an adequate substitute for
binaries]
>Nope.  All you need is Turbo-C (about $60 retail) and Turbo Pascal (a bit
>less, I think).
...
>I can't understand someone spending thousands of dollars on PC hardware,
>hundreds of dollars on modems and telephone charges, and then balking at
>shelling out 60 bucks for an _excellent_ C compiler!

This misses the point.  If somebody posts source that is compilable
only by the Datalight C compiler, or by MIX C, or by Microsoft Pascal,
or by Utah Pascal, or by CHASM, or by the Microsoft Macro assembler, or
by any of dozens of other language translators, having Turbo C and
Turbo Pascal would likely mean an investment of a few days or weeks (or
months) making that source work.

As I said before, microcomputer operating systems costing less than
$300 do not come bundled with any decent language translators.  Users
buy their own, and they are seldom compatible with each other.

ANSI C and cheap, conforming C compilers may change this to an extent.
But there will always be many things that will not be efficiently
doable in portable C.  High-performance graphics are one glaring
example.

Finally consider that not all users are, or want to be, programmers.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

Dion_L_Johnson@cup.portal.com (05/12/88)

I can well understand that there are good reasons for us (most of us) to
want to have source code, but there are also sometimes good reasons
to distribute binaries.  Hasnt this topic been talked to death in
other fora, at other times?  Perhaps someone will post a summary
about the good/bad points of each distribution scheme?  Also, why is
it that those who decry binaries do so with such vehemence?  Finally,
will the (someday) coming of widespread binary compatibility among
at least some classes of systems affect the acceptance of binary
distribution? comments?  And should this be discussed somewhere else?

thanks,  - Dion

danno@microsoft.UUCP (Dan Norton) (05/13/88)

In article <145@elgar.UUCP>, ford@elgar.UUCP (Ford Prefect ) writes:
> In article <9644@agate.BERKELEY.EDU> laba-5ac@web7f.berkeley.edu.UUCP (Erik Talvola) writes:
> >What's wrong with getting a 16-bit Compress executable file for the PC...
> 
> There are a few problems with this approach:
> 
> 1)	Such a compiler has to exist for the operating system you are
> 	running...
> 	                                               ... But one of
> 	the articles that was being followed up to mentioned an O.S.
> 	that only supported 64k segments.  Compress just won't work
> 	in such an environment without major redisign (like keeping
> 	the arrays in a disk file :-).

You are wrong.  In fact, such a compress exists, using memory only.
Several people, including myself, have been able to modify the standard
compress with little trouble, and it works just fine on IBM PC's.
.
.
.
.
.
.
.

linhart@topaz.rutgers.edu (Mike Threepoint) (05/13/88)

tneff@dasys1.UUCP (Tom Neff) writes:
-=> The source for ARC is available too, and it's running on (for instance)
-=> this Stride.  

>sigh<  But the only squash source I can find is in Pascal.  Speaking
of which...

-=> Don't confuse the ARC standard with Phil Katz's PC-optimized clone PKARC.
-=> Due to an assiduous sales job most PC sysops have the Katz thing, but it
-=> ain't the original.  The "C" language real McCoy is slower on PC's but
-=> more portable.

"Accept no imitations" should be reserved to sales jobs.  PKARC is
faster and compresses smaller, why wouldn't they use the Katz thing?
ARCE, NARC, and NSWEEP also support squashing, so it's not even
forcing PK(X)ARC on the users.

My bottom line is the archive size, speed is gravy unless it operates
as slow as... oh, I dunno... ARC?  :-)  On my BBS, my own experience
is that PKARC creates smaller archives than ZOO, so I use PKARC when I
don't need to store a directory subtree.  Squashing has saved over a
meg of space on my board.  Sometimes PKARC is stupid about compression
and squashes when it should crunch or crunches at a 0% compression
rate instead of storing, but most of the time it's smaller.

If ZOO crunched as well (>sigh<), I would use that.  [Selfish mode:
Maybe Rahul could find out what hashing algorithm PK or DWC is using
to get better compression rates.  Would simply things for me
considerably.]
-- 
"...billions and billions..."			| Mike Threepoint (D-ro 3)
			-- not Carl Sagan	| linhart@topaz.rutgers.edu
"...hundreds if not thousands..."		| FidoNet 1:107/513
			-- Pnews		| AT&T +1 (201)878-0937

gnu@hoptoad.uucp (John Gilmore) (05/13/88)

dhesi@bsu-cs.UUCP (Rahul Dhesi) wrote:
>                         If somebody posts source that is compilable
> only by the Datalight C compiler, or by MIX C, or by Microsoft Pascal...
>                                              ...having Turbo C and
> Turbo Pascal would likely mean an investment of a few days or weeks (or
> months) making that source work.

This is curious.  If someone posts a binary, getting it to work on
another compilation system would likely mean an investment of weeks or
months (or years) -- it's called "rewriting from scratch".  This does
not seem to bother Rahul; it only bothers him that IBM PC users might
have to think rather than just uudecoding and executing.

If somebody posts sources that are only compileable on one system (or,
bog help us, on one C compiler on one system) then they do not know how
to write a portable program.  Should we not take their sources, port
them to our systems if we want to run them, and send back or post
the fixes?  This is how I learned to write portable C code -- from 
seeing how portability problems had come about (in my code and in others'
code, on the net) and noticing how talented people had fixed them.

I asked a bunch of friends who work on IBM PC's why there are so many
programs in the micro world that only compile on certain C compilers.
This problem has been faced and solved 10 years ago in the mini world,
and the techniques are well known (#ifdef's, declaring "short" or
"long" rather than "int", relying on standard library routines rather
than system calls, passing the program to a few friends who have
different compilers/systems for testing before you post it, etc).
Nobody had a decent answer.  I'm forced to assume that most of these
authors do not know how to write or manage software.  The solution to
this problem is not to distribute their programs in binary.  The
solution is to distribute in source, fix it, and thereby teach
people a little bit more than they knew about building reliable software.

>                                                                 Users
> buy their own [compilers], and they are seldom compatible with each other.

Sun's C compiler is not completely compatible with Amdahl's, DEC's, and
GNU's.  But we know ways to write code that runs under all of them.
While the compiler writers could help a bit more, the real problem is
the users.

> But there will always be many things that will not be efficiently
> doable in portable C.  High-performance graphics are one glaring
> example.

Funny, the entire Sun graphics library is written in portable C.  All
the parts I've seen are in C, and it runs on 680x0, Sparc, and 386.  A
few months ago we ported it (the parts used in NeWS) to the Mac-II with
few problems.  Again, while there is always a bit to be squeezed out
with assembler, mostly the problem is people who don't know how to
write fast code.  Let's see their sources, speed them up, and send 'em
back.  Also give them a copy of Jon Bentley's "Writing Efficient
Programs" book.

> Finally consider that not all users are, or want to be, programmers.

But they come running to the programmers when their binary fails.
They had better have sources around if they expect their guru to be able
to help them!
-- 
John Gilmore  {sun,pacbell,uunet,pyramid,ihnp4}!hoptoad!gnu        gnu@toad.com
"Use the Source, Luke...."

loverso@encore.UUCP (John Robert LoVerso) (05/13/88)

In article <2932@cognos.UUCP> brianc@cognos.UUCP (Brian Campbell) writes:
> In article <696@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
> > If you are (sigh) going to post binaries on Usenet, DO NOT compress
> > them first.  Many Usenet sites use compress to pack up their news
> > batches.  Compressing a compressed file makes it larger.
> 
> Maybe those Usenet sites should not use the -f (force) flag with compress.

Thats not how (typical) news batching works.  Compress is used as a stage
of a pipe:
	batch | compress | uux
and because compress doesn't know the size of its input when it starts
up, it will *always* produce compressed output.

The point is that an article which contains binary thats compressed and then
uuencode/btoa/your_favorite'd will lower the compression ration for the batch
that contains it.  The overall size of the batch will be smaller if the
included binary was just uuencoded, etc.

I no longer carry comp.binaries.* as I am using its disk space to store
more *useful* news.  It would be nice to see such things split out
into bin.*.

As gnu says: "Use the Source, Luke..."

John Robert LoVerso, Encore Computer Corp
encore!loverso, loverso@multimax.arpa

phil@amdcad.AMD.COM (Phil Ngai) (05/14/88)

In article <786@stride.Stride.COM> mitch@stride1.UUCP (Thomas P. Mitchell) writes:
>The argument that not evrybody has all compilers is real.  Yet I
>dislike it.  To me compilers are like a key board,  a computer is
>worth little without one.   

Well, I strongly disagree. I don't have source for most of the things
I run on this PC, nor do I want it. I don't have time to tinker with
source code, compiling it, fixing it. I want to get the program and
start using it. Of course, I use programs like PC-NFS, SCHEMA, ORCAD,
PSPICE, and other CAD type tools. You probably don't know what this
stuff is, so you can't appreciate that some people want to do useful
work *with* their computers instead working *on* the computer. 

-- 
Make Japan the 51st state!

I speak for myself, not the company.
Phil Ngai, {ucbvax,decwrl,allegra}!amdcad!phil or phil@amd.com

allbery@ncoast.UUCP (Brandon S. Allbery) (05/15/88)

As quoted from <8430@iuvax.cs.indiana.edu> by bobmon@iuvax.cs.indiana.edu (RAMontante):
+---------------
| The biggest problem I see is that many news mailers compress everything
| blindly, so that an already-compressed file gets bigger.  This would also be
| true of a sufficiently random file, although I think most executables aren't
| that random.  And this compress-and-be-damned behavior is not a strength of
| the system, it's a weakness.  (Even compress will complain if its result is
| bigger than its original; does the mailer ignore this, or are the net.gods
| lying when they claim they're shipping bigger files because of the double
| compression?)
+---------------

When compress is invoked as

		compress (file)

it complains.  When it's invoked as:

		sendbatch | compress | uux -r - oopsvax!rnews

it can't do so without compressing to a temp file while saving its input in
a second temp file, then comparing sizes and copying the smaller of the two:
wasteful of space and time.  (You can't, of course, seek backwards on a
pipe.)
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
	{well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery
Delphi: ALLBERY						     MCI Mail: BALLBERY

allbery@ncoast.UUCP (Brandon S. Allbery) (05/16/88)

As quoted from <563@csccat.UUCP> by loci@csccat.UUCP (Chuck Brunow):
+---------------
| In article <2096@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes:
| >In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
| >>   I stand corrected.  Since Lem-Ziv was DESIGNED for text compression, and
| >>the authors do not mention its use for binaries, i never considered using it.
| >>I tried it on an executable under UNIX and obtained a good reduction, for 
| >>reasons which are not apparent.  I'm sure that there are cases where this does
| 
| 	This is actually partially true. The first "compress" to appear
| 	on the net (several years ago) only worked on text files and
| 	dumped core on binary files. The reason you get good compressions
| 	on binary files is probably that they haven't been stripped of
| 	the relocation info. Strip them first and I doubt that the
| 	compression will be so good (otherwise, throw your optimizer
| 	into the bit bucket). Typical (large) text compression is about
| 	67%, whereas binaries are closer to 20%. (I use 16-bit compress).
+---------------

Wrong.  Consider that, for example, every call to putchar() contains some
fixed code (such as a call to _flsbuf()); this, on a 32-bit address space
machine, will always be the same byte sequence (on a 680x0, it's 6 bytes).
Other things will also be common:

	printf("format", non-double-value);

(which is by far the *most* common use of printf(), from what I've seen;
perhaps others have seen other more common calls) has the constant assembler
code on a 680x0:

		jsr	_printf			6 bytes
		addql	#8,a6			2 bytes

(and "printf("constant")", also common, is a slightly different 8-byte value).
These kind of extremely common operations can't be optimized out and are
quite amenable to compression.

RISC eecutables are likely to be even more amenable to compression, since
many operations will assemble into lengthy byte sequences -- many of which
will be partially or totally identical.

Ergo:  compression of executables generally works pretty well.  (I regularly
see 50%-60% on stripped, optimized executables on ncoast.)
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
	{well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery
Delphi: ALLBERY						     MCI Mail: BALLBERY

jpn@teddy.UUCP (John P. Nelson) (05/16/88)

>The point is that an article which contains binary thats compressed and then
>uuencode/btoa/your_favorite'd will lower the compression ration for the batch
>that contains it.  The overall size of the batch will be smaller if the
>included binary was just uuencoded, etc.

If this was TRUE, it would be a good argument.  It is NOT true.

Most binary files that are compressed, uuencoded, then compressed again
are SMALLER than binary files that are simply uuencoded, then
compressed.  I have yet to see anyone post results that refute this.

A few people have pointed out counter-examples:  These usually involve
compressing an ARC file (or other binary file with very little
compressability in the first place).  The few cases I have seen where
using ARC (which will NOT try to compress a file that is
uncompressable), followed by uuencode, followed by compress generates a
larger file than uuencode/compress alone, the file lengths were within
1% of each other.

If someone has seen different results, I would be interested in seeing
them.  I already KNOW that compressing ASCII files (source or text)
then uuencoding is a bad idea:  I am interested in results from BINARY
FILES only!  I think we should SETTLE this issue once and for all!
-- 
     john nelson

UUCP:            {decvax,mit-eddie}!genrad!teddy!jpn
ARPA (sort of):  talcott.harvard.edu!panda!teddy!jpn

dhesi@bsu-cs.UUCP (Rahul Dhesi) (05/17/88)

To avoid ambiguity, I suggest the following terminology.

     B = binary
     T = text
     U = uuencoding
     C16 = 16-bit LZW ("compress" default)
     C12 = 12-bit LZW (arc)
     C13 = 13-bit LZW (zoo, squashing)

So, instead of claiming that "uuencoded binary files compressed are
larger than not uuencoding" it is better to say that "BC12UC16 is worse
than BC16", or "BUC16U is worse than BC16" etc.

     BC12UC16 means:
       (B)   take a binary file
       (C12) compress using arc or 12-bit "compress"
       (U)   uuencode it
       (C16) compress using 16-bit "compress"

Also, since binary files differ, it's good to use some standard binary
file in benchmarks, e.g. your UNIX kernel stripped of symbols, so there
is some degree of consistency.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

dg@lakart.UUCP (David Goodenough) (05/18/88)

From article <4776@teddy.UUCP>, by jpn@teddy.UUCP (John P. Nelson):
>>The point is that an article which contains binary thats compressed and then
>>uuencode/btoa/your_favorite'd will lower the compression ration for the batch
>>that contains it.  The overall size of the batch will be smaller if the
>>included binary was just uuencoded, etc.
> 
> If this was TRUE, it would be a good argument.  It is NOT true.
> 
> Most binary files that are compressed, uuencoded, then compressed again
> are SMALLER than binary files that are simply uuencoded, then
> compressed.  I have yet to see anyone post results that refute this.

>>>>> CORRECT <<<<<

Note this:

Script started on Tue May 17 13:01:21 1988
lakart!dg(bin/junk)[61]-> ds
-rwxr-x---  1 dg          40960 May 17 12:55 arc*
-rwxr-x---  1 dg          32768 May 17 12:55 atob*
-rwxr-x---  1 dg          16384 May 17 12:55 btoa*
-rwxr-x---  1 dg          36864 May 17 12:55 cg*
-rwxr-x---  1 dg          24576 May 17 12:55 clock*
-rwxr-x---  1 dg          16384 May 17 12:55 ddr*
lakart!dg(bin/junk)[62]-> file *
arc:	demand paged pure executable
atob:	demand paged pure executable
btoa:	demand paged pure executable
cg:	demand paged pure executable
clock:	demand paged pure executable
ddr:	demand paged pure executable
lakart!dg(bin/junk)[63]-> foreach i (*)
? compress < $i | uuencode $i.Z | compress > $i.u.Z
? uuencode $i < $i | compress > $i.u.z
? end
lakart!dg(bin/junk)[64]-> ds
-rwxr-x---  1 dg          40960 May 17 12:55 arc*
-rw-r-----  1 dg          29285 May 17 13:03 arc.u.Z
-rw-r-----  1 dg          32027 May 17 13:03 arc.u.z
-rwxr-x---  1 dg          32768 May 17 12:55 atob*
-rw-r-----  1 dg          18418 May 17 13:04 atob.u.Z
-rw-r-----  1 dg          19330 May 17 13:04 atob.u.z
-rwxr-x---  1 dg          16384 May 17 12:55 btoa*
-rw-r-----  1 dg           7896 May 17 13:04 btoa.u.Z
-rw-r-----  1 dg           8384 May 17 13:04 btoa.u.z
-rwxr-x---  1 dg          36864 May 17 12:55 cg*
-rw-r-----  1 dg          28412 May 17 13:04 cg.u.Z
-rw-r-----  1 dg          30864 May 17 13:04 cg.u.z
-rwxr-x---  1 dg          24576 May 17 12:55 clock*
-rw-r-----  1 dg          14299 May 17 13:04 clock.u.Z
-rw-r-----  1 dg          15116 May 17 13:04 clock.u.z
-rwxr-x---  1 dg          16384 May 17 12:55 ddr*
-rw-r-----  1 dg           7123 May 17 13:04 ddr.u.Z
-rw-r-----  1 dg           7682 May 17 13:04 ddr.u.z
lakart!dg(bin/junk)[65]-> foreach i (*[a-y])
? echo $i
? echo -n `wc -c <$i.u.Z` '* 100 /' `wc -c <$i` '== '
? z `wc -c $ <$i.u.Z` '* 100 /' `wc -c <$i`
? echo -n `wc -c <$i.u.z` '* 100 /' `wc -c <$i` '== '
? z `wc -c <$i.u.z` '* 100 /' `wc -c <$i`
? end
arc
29285 * 100 / 40960 == 71 == 0x47
32027 * 100 / 40960 == 78 == 0x4e
atob
18418 * 100 / 32768 == 56 == 0x38
19330 * 100 / 32768 == 58 == 0x3a
btoa
7896 * 100 / 16384 == 48 == 0x30
8384 * 100 / 16384 == 51 == 0x33
cg
28412 * 100 / 36864 == 77 == 0x4d
30864 * 100 / 36864 == 83 == 0x53
clock
14299 * 100 / 24576 == 58 == 0x3a
15116 * 100 / 24576 == 61 == 0x3d
ddr
7123 * 100 / 16384 == 43 == 0x2b
7682 * 100 / 16384 == 46 == 0x2e
lakart!dg(bin/junk)[66]-> ^D
script done on Tue May 17 13:12:09 1988

In all cases (I actually looked at over 20 stripped executables)

    compress | uuencode | compress

is fractionally smaller than:

    uuencode | compress,

Both of which are heaps smaller than the raw file.

Since the difference between compress | uuencode | compress and just plain
uuencode | compress is so small (between 2 and 10 %) I can't see the point
in continuing this discussion.

>>>>> THEREFORE 'COMPRESS, UUENCODE AND POST' GETS THE BEST RESULTS. <<<<<
[1]

Now can we let this subject rest in peace.

[1] Not counting ARC & ZOO. I Don't know where they stand, so I am saying
nothing about them.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
							| +-+-+
	....... !harvard!adelie!cfisun!lakart!dg	+-+-+ |
						  	  +---+

allbery@ncoast.UUCP (Rich Garrett) (05/24/88)

As quoted from <4776@teddy.UUCP> by jpn@teddy.UUCP (John P. Nelson):
+---------------
| >The point is that an article which contains binary thats compressed and then
| >uuencode/btoa/your_favorite'd will lower the compression ration for the batch
| >that contains it.  The overall size of the batch will be smaller if the
| >included binary was just uuencoded, etc.
| 
| If this was TRUE, it would be a good argument.  It is NOT true.
| 
| Most binary files that are compressed, uuencoded, then compressed again
| are SMALLER than binary files that are simply uuencoded, then
| compressed.  I have yet to see anyone post results that refute this.
+---------------

Single files, yes.  But the quoted message above specifically says BATCHES.
Batches include messages of all kinds from multiple newsgroups; to verify
whether batch compression is reduced, we have to modify sendbatch to print
the compression ratio and then run sendbatch with both compressed and
uncompressed uuencodes to see which results in smaller batches.  (We also
need a non-destructive "test" mode for sendbatch to (a) insure that the
batches are otherwise identical and (b) not screw up news transmission.)
This would have to be done with a number of batches and the results averaged
in order to give us a reasonably accurate result.
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
	{well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery
Delphi: ALLBERY						     MCI Mail: BALLBERY

jpn@teddy.UUCP (John P. Nelson) (05/24/88)

>| Most binary files that are compressed, uuencoded, then compressed again
>| are SMALLER than binary files that are simply uuencoded, then
>| compressed.  I have yet to see anyone post results that refute this.
>
>Single files, yes.  But the quoted message above specifically says BATCHES.
>...
>to verify whether batch compression is reduced, we have to modify sendbatch...

Well, anyone is welcome to MAKE this expermiment, but it is totally
unnecessary in my opinion.

Of COURSE pre-compressing is going to reduce the compression ratio
of a batch.  This is irrelevent, because less data needs to be batched.

Remember:  "compress" uses an ADAPTIVE Lempel-Ziv method:  If the old
string table isn't working, Compress will RESET the table and start
over, right in the middle of the file being compressed.  Neither a
uuencoded file nor a compressed uuencoded file will look much like an
ordinary ascii file: the string table used in either "uuencode" case
will not look much like the string table generated for normal text
batches.  Either type of "uuencode" will cause "compress" to reset the
string table.

Besides, I think we have gotten off the track here.  Even if
pre-compressing DOES adversely affect the overall size of data
transmitted slightly:  I think we have shown that it doesn't increase
the size of the data SIGNIFICANTLY:  in the simplest case, it REDUCES
the size.  The most common use of pre-compressing is the use of "ARC"
(or "zoo") to bundle multiple files together:  There is no other way
convenient way to "bundle" binary files at the moment.  I think we have
shown that the use of ARC to build bundles of binary files is NOT
detrimental!  I think we should now focus our collective energies on a
more productive topic.

The real unresolved issue, of course, is whether to allow binaries on
USENET at all.

-- 
     john nelson

UUCP:	{decvax,mit-eddie}!genrad!teddy!jpn
smail:	jpn@genrad.com