[comp.binaries.ibm.pc.d] Standard for file transmission

jeff@cullsj.UUCP (Jeffrey C. Fried) (05/01/88)

   Currently there is an ongoing discussion in comp.binaries.ibm.pc.d 
concerned with establishing a standard for the exchange of software over 
the net.  I would like to offer a suggestion.  The following tools are 
available in source code format:  COMPRESS (Lem-Ziv text compressor), ARITH
(arithmetic compression for binary), UUencode/decode.  Since all of these
will run under a variety of environments (IBM-PC, AMIGA, ATARI, VMS, SYS5,
BSD), why not make these the basis for communicating.  I may have a public
domain SHAR/UNSHAR program in C as well (i do have a text archiver in PASCAL
as well that would suffice if PASCAL were acceptable to everyone).
   These should be enough to support everyone's needs, and because we have
the source, it can be made available to everyone.  I realize that some of
you may have a favorite tool which you may feel surpasses the capabilities
of these.  I am making these suggestions to provide a starting point towards
arriving at a mutually acceptable standard.
   For those who do not know, COMPRESS is a single file text compressor
which works faster than any of the ARC clones, and, ARITH is something i
constructed from a description in the ACM.  ARITH compresses slower, but
better, than huffman, ie., SQ/UNSQ.  Most of all its in the public domain
and i'll be posting source if enough people show an interest.
   In any case, let the discussion continue.

w8sdz@brl-smoke.ARPA (Keith B. Petersen ) (05/01/88)

Rather than discussing how to compress our files we should be discussing
how to get them transferred error-free through the network.
Uuencode/uudecode and compress/uncompress do no error checking.

Think about this the next time you are tempted to uuencode a binary
file.  How do you know it will be received error-free by the recipient?
At least when it is compressed by the ARC program a CRC of the original
file is stored *inside* the ARC.  It is checked when you extract the
member file.

The net spends thousands of dollars on reposts of truncated or otherwise
munged files.  Some of that money would be better spent on finding where
the problem is and fixing it.  A uuencode with CRC or checksum would go
a long way towards finding the site(s) responsible for this waste.
-- 
Keith Petersen
Arpa: W8SDZ@SIMTEL20.ARPA
Uucp: {bellcore,decwrl,harvard,lll-crg,ucbvax,uw-beaver}!simtel20.arpa!w8sdz
GEnie: W8SDZ

wcf@psuhcx.psu.edu (Bill Fenner) (05/01/88)

Just one thing that needs to be known -- PC's can do no more than 12-bit
compression.  So if you are compressing your file from a UNIX system,
you need to say comress -b12 filename .

  Bill
-- 
   __      _  _      _____   Bill Fenner     Bitnet: wcf @ psuhcx.bitnet
  /  )    // //       /  '                   Internet: wcf @ hcx.psu.edu
 /--<  o // //     ,-/-, _  __  __  _  __    UUCP: ihnp4!psuvax1!psuhcx!wcf
/___/_<_</_</_    (_/   </_/ <_/ <_</_/ (_   Fido: Sysop at 263/42

jpn@teddy.UUCP (John P. Nelson) (05/02/88)

>Just one thing that needs to be known -- PC's can do no more than 12-bit
>compression.  So if you are compressing your file from a UNIX system,
>you need to say comress -b12 filename .

This myth has been repeated several times, so I felt it was necessary to
speak up.  PCs most certainly CAN do a 16 bit compress/uncompress.  It
takes 512K of available memory to run, and you also either need a compiler
that supports HUGE model arrays, or else you have to manually break up the
buffer space into multiple 64K arrays (this is what the version I have does -
The port was done a couple of years ago for XENIX, but it works just fine
under MSDOS as well).
-- 
     john nelson

UUCP:            {decvax,mit-eddie}!genrad!teddy!jpn
ARPA (sort of):  talcott.harvard.edu!panda!teddy!jpn

rsalz@bbn.com (Rich Salz) (05/02/88)

If you are (sigh) going to post binaries on Usenet, DO NOT compress
them first.  Many Usenet sites use compress to pack up their news
batches.  Compressing a compressed file makes it larger.
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.

wcf@psuhcx.psu.edu (Bill Fenner) (05/02/88)

In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
>>Just one thing that needs to be known -- PC's can do no more than 12-bit
>>compression.  So if you are compressing your file from a UNIX system,
>This myth has been repeated several times, so I felt it was necessary to
>speak up.  PCs most certainly CAN do a 16 bit compress/uncompress.  It
>takes 512K of available memory to run, and you also either need a compiler

Hard as it is to believe, a lot of people don't have 640k computers...
But, I think that this utility would do well to be distributed... mind
posting it on comp.binaries.ibm.pc?  (Can it do 12-bit also?)

         Thanks
-- 
   __      _  _      _____   Bill Fenner     Bitnet: wcf @ psuhcx.bitnet
  /  )    // //       /  '                   Internet: wcf @ hcx.psu.edu
 /--<  o // //     ,-/-, _  __  __  _  __    UUCP: ihnp4!psuvax1!psuhcx!wcf
/___/_<_</_</_    (_/   </_/ <_/ <_</_/ (_   Fido: Sysop at 263/42

jeff@cullsj.UUCP (Jeffrey C. Fried) (05/03/88)

In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes:
> Just one thing that needs to be known -- PC's can do no more than 12-bit
> compression.  So if you are compressing your file from a UNIX system,
> you need to say comress -b12 filename .
   I've constructed a version of COMPRESS using 13 bits and the small
model by making only one array large.  I've also constructed a version in
BIG mode which runs at half the speed and compress only 10 better using the
full addressing used under UNIX.

wcf@psuhcx.psu.edu (Bill Fenner) (05/03/88)

In article <696@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
>If you are (sigh) going to post binaries on Usenet, DO NOT compress
>them first.  Many Usenet sites use compress to pack up their news
>batches.  Compressing a compressed file makes it larger.

We've gone through this before, and it has never been explained to my
satisfaction.  I think you do save something by compressing a uuencoded
compressed file over compressing the uuencoded uncompressed file.

I just did a test.  The file I used may not have been a good 'average
binary' (I used a moria save character - the best I could find on short
notice).  Anyway...

Origional size: (cannot send; it's binary): 95,348 bytes
Compressed (also cannot send; also binary):  6,772 bytes

Now... UUEncoded then compressed (the amount that would be transmitted
if you simply uuencode the file) :  11,531 bytes

And the kicker... compressed, UUEncoded, then compressed (as if you
compressed it, then uuencoded it, then posted it, then the news will
compress it) : 9009 bytes.

Like I said, this may not have been a proper 'average binary'.
I am going to write a shell script to check all these things, and
run it on several actual PC binaries and ARC files. I will post the
results to comp.binaries.ibm.pc.d.
-- 
   __      _  _      _____   Bill Fenner     Bitnet: wcf @ psuhcx.bitnet
  /  )    // //       /  '                   Internet: wcf @ hcx.psu.edu
 /--<  o // //     ,-/-, _  __  __  _  __    UUCP: ihnp4!psuvax1!psuhcx!wcf
/___/_<_</_</_    (_/   </_/ <_/ <_</_/ (_   Fido: Sysop at 263/42

loci@csccat.UUCP (Chuck Brunow) (05/03/88)

In article <55@psuhcx.psu.edu> wcf@psuhcx (Bill Fenner) writes:
>Just one thing that needs to be known -- PC's can do no more than 12-bit
>compression.  So if you are compressing your file from a UNIX system,
>you need to say comress -b12 filename .
>

	Are you quite sure about that? 13-bit compress will run on other
	64k segment machines (80?86 based).

jpn@teddy.UUCP (John P. Nelson) (05/03/88)

>If you are (sigh) going to post binaries on Usenet, DO NOT compress
>them first.  Many Usenet sites use compress to pack up their news
>batches.  Compressing a compressed file makes it larger.

This is incorrect.

I hope I can clear this up once and for all:

If you have ascii files (like source or documentation), then it is true
that compressing, then uuencoding is a BAD IDEA, even though the posting
appears to be smaller than the cleartext.  That is because when the file is
compressed again, it will be larger than the cleartext after IT is
compressed.

If you have a binary file that MUST be uuencoded to be posted, then
compression before uuencoding IS HELPFUL!  Most files that are
compressed, then uuencoded, then compressed again are signficantly
smaller than files that are simply uuencoded, then compressed once!  I
think that the reason this is true is that uuencoding tends to
interfere with the compression process.  By the way, compressing a
uuencoded file almost always results in a small reduction in size.

When I say "compressed", I include archival programs such as ARC and ZOO.

These conclusions were reached by experimental evidence (I didn't conduct
the experiments, others did, and they posted their results).  Perhaps
no one bothered to read these informative articles, (or else my suspicion
is true:  the maximum long-term memory of the average USENET reader is
no more than 1 month long).

-- 
     john nelson

UUCP:            {decvax,mit-eddie}!genrad!teddy!jpn
ARPA (sort of):  talcott.harvard.edu!panda!teddy!jpn

egisin@watmath.waterloo.edu (Eric Gisin) (05/04/88)

In article <696@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes:
> If you are (sigh) going to post binaries on Usenet, DO NOT compress
> them first.  Many Usenet sites use compress to pack up their news
> batches.  Compressing a compressed file makes it larger.

But you are would not be compressing the compressed file,
you would be compressing an encoded file.

Here are the results of some experiments on a 100K UNIX binary:
$ uuencode | compress
-rw-r--r--  1 egisin           83111 May  3 16:25 uu.Z
$ compress | uuencode | compress
-rw-r--r--  1 egisin           81241 May  3 16:30 uuz.Z
Compressing before encoding results in a 2% shorter file,
but that is not really significant.

You can get better results by using a simple hex encoding:
$ compress | hexencode | compress
-rw-r--r--  1 egisin           78831 May  3 16:31 hdz.Z

None of this applies to source files,
they should never be compressed and encoded.

jeff@cullsj.UUCP (Jeffrey C. Fried) (05/04/88)

  1) COMPRESS is a text only compression routine.  It will not now, or ever,
     help in the compression of binary files.

  2) ARITH is a more general compression routine using adaptive arithmetic 
     coding.  It will compress binary files where there is redundancy, but
     when it fails (on an extremely random file) the result increases very
     little (under 1% in my experience).  It compresses better than HUFFMAN,
     but it is NOT faster than SQ/UNSQ which are written in assembler whereas
     ARITH is written in C.
     (Once again, i will post it if there is sufficient interest.)

  3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
     we are at the whims of whomever is currently supporting (or not supporting)
     them.

  4) COMPRESS works faster and better on text files then the ARC routines
     because they use 12 bit compression, where 13-bit (and more) are possible
     under even the PC for COMPRESS (i've tried it on ans AT-clone).

  5) On the weak side, there is as yet, no CRC or checksum for any of these,
     but adding it would be someithing i am willing to take responsibility
     for should enough people decide they would like to take the approach
     which i'm currently suggesting.
     Also, there no directory support provided with these tools.  They work
     on only one file at a time.  This is also correctable since the source
     is available.

  5) LASTLY: I am not trying to criticize the ARC routines, rather i am trying
     to offer an alternative which i feel will reduce the time for transmission
     of files, as well as, providing us with portability.  COMPRESS, ARITH, 
     UNSHAR and UUENCODE are all available at the source level.  COMPRESS and
     ARITH have been tried in at least three different environments: UNIX (BSD),
     VMS and PC/MS-DOS.
     Remember, for those of us who are NOT using the NET at the expense of a
     university, the cost of communication, and therefore the time required
     to transmit a file, are VERY important.

   If this sounds like a flame, then please assign my apparent bad attitude to
poor methodology rather than a desire to upset people.  This is provided in the
spirit of adding to what i hope will become a meaningful dialog with a very
practicle result.

mike@ists (Mike Clarkson) (05/04/88)

In article <696@fig.bbn.com>, rsalz@bbn.com (Rich Salz) writes:
> If you are (sigh) going to post binaries on Usenet, DO NOT compress
> them first.  Many Usenet sites use compress to pack up their news
> batches.  Compressing a compressed file makes it larger.

How about compressing a uuencoded compressed file.  Does that result in
a significantly larger than original file?

I would really like to see a uniform standard, with error checking, and
I think it is something worth the time it takes to do it.  We could
probably evolve the result to take care of another pet peeve of mine:
error correction in the tar format.  One thing I really miss from VMS is
the backup tape archiver, which has tremendous error checking and
correction.  In 7 years I have only ever had (touch wood) 1 tape go on
me, and that was because the oxide was falling off.  Having spent a good
part of today dealing with yet another dead Unix tar tape, I really wish
we could find a better way.

-- 
Mike Clarkson						mike@ists.UUCP
Institute for Space and Terrestrial Science		mike@ists.yorku.ca
York University, North York, Ontario,
CANADA M3J 1P3						(416) 736-5611

chasm@killer.UUCP (Charles Marslett) (05/04/88)

In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes:
> Just one thing that needs to be known -- PC's can do no more than 12-bit
> compression.  ...

Actually, I have sent several people copies of a minor mod to compress 4.0
that works fine if you have the memory (requires about 350-400 K above DOS
to do 16-bit compression).  The source assumes Turbo or Microsoft C for the
PC but it doesn't take up an immense amount of disk space either (about 40K
if I remember correctly).  I have also ported it to Atari STs, so that covers
some of the PC field.  Anyone want to merge these changes into the more
recent (4.1?) posting and perhaps make it work on Macs and Amigas?

Any good rule of thumb on how many requests imply a posting choice?

>   Bill

Charles Marslett
chasm@killer.UUCP
...!ihnp4!killer!chasm

jcs@tarkus.UUCP (John C. Sucilla) (05/04/88)

In article <55@psuhcx.psu.edu> wcf@psuhcx (Bill Fenner) writes:
>Just one thing that needs to be known -- PC's can do no more than 12-bit
>compression.  So if you are compressing your file from a UNIX system,
>you need to say comress -b12 filename .

Wrong! My 640K AT&T PC6300 has compress v4.0 running 16 bits on it
right now. -V shows the options at: MSDOS, XENIX_16 and BITS=16.

-- 
John "C" Sucilla
{ihnp4}!tarkus!jcs

Don't let reality stop you....

karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) (05/04/88)

     1) COMPRESS is a text only compression routine.  It will not now, or ever,
        help in the compression of binary files.

Nonsense.

[58] [8:33am] tut:/dino0/karl/bin/pyr/private> list enable
34831 -rwsr-x---  2 root     staff        8192 Apr 13 09:54 enable
[59] [8:33am] tut:/dino0/karl/bin/pyr/private> file enable
enable: 90x family demand paged pure executable
[60] [8:33am] tut:/dino0/karl/bin/pyr/private> compress -v < enable > enable.Z
Compression: 72.44%
[61] [8:33am] tut:/dino0/karl/bin/pyr/private> list enable.Z
35427 -rw-r--r--  1 karl     staff        2257 May  4 08:34 enable.Z
[62] [8:33am] tut:/dino0/karl/bin/pyr/private> 

--Karl

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/04/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
| 
|   1) COMPRESS is a text only compression routine.  It will not now, or ever,
|      help in the compression of binary files.

[ compress gives about 30% compression on binaries, depending on
  content. Whoever told you that it was for text only was completely
  wrong. ]
| 
|   2) ARITH is a more general compression routine using adaptive arithmetic 
|      coding.  It will compress binary files where there is redundancy, but
|      when it fails (on an extremely random file) the result increases very
|      little (under 1% in my experience).  It compresses better than HUFFMAN,
|      but it is NOT faster than SQ/UNSQ which are written in assembler whereas
|      ARITH is written in C.
|      (Once again, i will post it if there is sufficient interest.)

[ once again, do it, in source, so that others can test it themselves
  rather than relying on your opinion. ]
| 
|   3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
|      we are at the whims of whomever is currently supporting (or not supporting)
|      them.

[ the sources for zoo and arc have been posted several times to the net,
  and are available on a number of sites via ftp, uucp, and simple BBS
  download. ]

|   5) On the weak side, there is as yet, no CRC or checksum for any of these,
|      but adding it would be someithing i am willing to take responsibility
|      for should enough people decide they would like to take the approach
|      which i'm currently suggesting.

[ zoo and arc both have CRC. ]

|      Also, there no directory support provided with these tools.  They work
|      on only one file at a time.  This is also correctable since the source
|      is available.

[ arc works on multiple files in multiple directories, but doesn't
  preserve subdirectory information. zoo preserves the information unless
  told not to do it (an option). ]
| 
|   5) LASTLY: I am not trying to criticize the ARC routines, rather i am trying
|      [ deleted for brevity ]
|      Remember, for those of us who are NOT using the NET at the expense of a
|      university, the cost of communication, and therefore the time required
|      to transmit a file, are VERY important.

[ everyone would like faster transmissions, but not at the expense of
  using a non-standard format which people can't use. Sending info which
  is not useful is a *real* waste of bandwidth. ]
| 
|    If this sounds like a flame, then please assign my apparent bad attitude to
| poor methodology rather than a desire to upset people.  This is provided in the
| spirit of adding to what i hope will become a meaningful dialog with a very
| practicle result.  

The most charitable assumption I can make is that you are woefully
misinformed about the matters on which you speak. Please post this
"ARITH" routine to let others evaluate it, and read the responses to
your posting, many of which will probably not be even a polite as this
one.

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/04/88)

  I would like to add a little fuel to the fires of "which archiver"
discussion.  Use of the 'btoa' routine instead of uuencode would save
12% (!) on binary postings. This is a PD program, included in the
compress package, and runs just fine on a PC.

  All the discussion of using PKARC to save 1-2% or not using it to save
time for many of the people on the net seems pointless. We should use
both (standard) arc and zoo formats, uuencode them, and save bandwidth
by dropping this discussion. Hopefully Rahul will clarify this by edict.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

jpn@teddy.UUCP (John P. Nelson) (05/04/88)

>  1) COMPRESS is a text only compression routine.  It will not now, or ever,
>     help in the compression of binary files.

Whoa!  Where did THIS come from!?!?  It is simply not true!

It IS true that compress does a better job at compressing text files,
but this is because there is usually more redundency in text files than most
binary files (like executables).  Compress is simply MARVELOUS for
binary files like bit-mapped graphics, getting something like 90%
compression for many of them.

>  2) ARITH is a more general compression routine using adaptive arithmetic 
>     coding.  It will compress binary files where there is redundancy, but
>     when it fails (on an extremely random file) the result increases very
>     little (under 1% in my experience).  It compresses better than HUFFMAN,
>     but it is NOT faster than SQ/UNSQ which are written in assembler whereas
>     ARITH is written in C.
>     (Once again, i will post it if there is sufficient interest.)

Now we get some facts.  ARITH is HUFFMAN encoding. Compress is Lempel-Ziv
encoding.  Lempel-Ziv almost ALWAYS beats HUFFMAN (when there is a redundancy).
It is certainly possible that Lempel-ziv might expand random files more than
HUFFMAN, I haven't done any tests.

Older versions of ARC used to try both HUFFMAN and Lempel-Ziv, and use
the one that gave better compression.  The HUFFMAN support was dropped
(except for extracting from old archives), because Lempel-Ziv beat HUFFMAN
99% of the time!

>  3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
>     we are at the whims of whomever is currently supporting (or not supporting)
>     them.

MORE untruths.  The source for both ZOO and ARC are in C, and have been
distributed on USENET several times!  Some versions of the ARC source
included the extra code to handle the SQUASH compression algorithm
added by PKARC.

>  4) COMPRESS works faster and better on text files then the ARC routines
>     because they use 12 bit compression, where 13-bit (and more) are possible
>     under even the PC for COMPRESS (i've tried it on ans AT-clone).

PKARC's SQUASH is 13 bit compression.  Any more than this requires a
working buffer larger than 64K, which is why they are generally not used
very much on PCs.  The amount of additional compression between 13 bit
and 16 bit is no more than 2 or 3 percent!

Also, there is very little difference in speed between the 12 bit and
13 bit compression algorithms.  The major difference is in the memory
requirements.


>  5) On the weak side, there is as yet, no CRC or checksum for any of these,
>     but adding it would be someithing i am willing to take responsibility
>     for should enough people decide they would like to take the approach
>     which i'm currently suggesting.

This is the LEAST of the problems with using compress.

>     Also, there no directory support provided with these tools.  They work
>     on only one file at a time.  This is also correctable since the source
>     is available.

True, but why reinvent the wheel.  The source for the EXISTING programs is
ALSO available!

>   If this sounds like a flame, then please assign my apparent bad attitude to
>poor methodology rather than a desire to upset people.  This is provided in the
>spirit of adding to what i hope will become a meaningful dialog with a very
>practicle result.  

Your bad attitude appears to be due to an overdose of misinformation!
-- 
     john nelson

UUCP:            {decvax,mit-eddie}!genrad!teddy!jpn
ARPA (sort of):  talcott.harvard.edu!panda!teddy!jpn

dhesi@bsu-cs.UUCP (Rahul Dhesi) (05/04/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>  3) The source for ZOO, PKARC, and the others is NOT available.

The source for zoo 1.51 was posted to comp.sources.unix in the summer
of 1987.  The source for zoo 2.01 will be posted in the near future.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

bobmon@iuvax.cs.indiana.edu (RAMontante) (05/04/88)

cullsj.UUCP (Jeffrey C. Fried) writes, among other things:
,
, 1) COMPRESS is a text only compression routine.  It will not now, or ever,
,    help in the compression of binary files.

This statement made me shell out and run the following quick experiment:

	-rwxr-xr-x  1 bobmon      15360 Feb 27 01:22 pgen
	-rwxr-xr-x  1 bobmon      10116 May  4 08:46 pgen.Z

	-rwxr-xr-x  1 bobmon      14336 Feb 24 08:19 pom
	-rwxr-xr-x  1 bobmon       9945 May  4 08:47 pom.Z

Pgen and pom are both executable files (compiled from 'c').  Granted, this is
on a VAX machine, running the full-blown compress.  My attempts to run
compress on my 8088 box were frustrating, given its memory requirements, and
I haven't seen enough '.Z' formatted files to be worth the hassle.  But I
would assume that if it runs at all on a smaller machine, it will produce
the same results; unlike zoo and arc, it cannot choose one compression method
over another.

, 3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
,    we are at the whims of whomever is currently supporting (or not supporting)
,    them.

Source for arc is, at least for some Unix boxes.  Zoo source has been promised.
Pkarc was originally written in 8088 assembler, not the friendliest source.

, 4) COMPRESS works faster and better on text files then the ARC routines
,    because they use 12 bit compression, where 13-bit (and more) are possible
,    under even the PC for COMPRESS (i've tried it on ans AT-clone).

I haven't seen source for compress, either.  And the executables I've seen
were enormous, and limited to 12-bit LZW on 8088's under MSDOS; just like zoo
and arc (and pkarc's squash method is some sort of 13-bit LZW).  I've never
heard anyone claim responsibility for compress, while the authors of zoo,
pkarc, and arc are named, revered, vilified, and flamed frequently.  At least
one of them is an active participant on the Usenet.  (Plug:  I think that's
one strength of zoo, although Rahul might disagree :-)

, 5) On the weak side, there is as yet, no CRC or checksum for any of these,

Any of WHAT?  Zoo and arc certainly have a CRC value.  Compress is compress.
Its Unix-origin philosophy says that separate functions should be done by
separate routines with their outputs tied together by the operating system.
I think this is at the heart of some of the debates here.  The philosophy works
fine on a big multitasking machine like a VAX (or a suitably equipped 680x0
or '386?), and the entire news mailer system is predicated on that principle --
the mailer just calls compress (EVERYbody has compress, right?) to pack things
in for it; it doesn't worry about whether the result is correct, and neither
does compress.  It's up to you to aggregate your files with shar or something.

This piece-at-a-time philosophy is weaker on something like my MSDOS 8088 box.
There aren't multiple users all needing similar fundamental tools, there's
just me.  And I haven't the resources (memory or CPU cycles) to support lots
of little pieces that work fine individually but need sophisticated glue to
work together;  MSDOS's simulation of pipes is pathetic.  In such a situation
an integrated package (viz., zoo or arc) makes a lot more sense.  They can
incorporate in a consistent manner all those little pieces that a system admin.
may have put on a Unix box, but which I haven't yet found while rummaging
around BBS's.  By integrating everything a top-down design is possible,
unlike what happens when you bend the problem to fit the tools you already
have.

,    but adding it would be someithing i am willing to take responsibility
,    for should enough people decide they would like to take the approach
,    which i'm currently suggesting.

At which point it will become yet another uncommon non-standard (like ARITH?).
I don't think adding code will make it fit any better on small machines, and
the big machines can afford to calculate a CRC with an external routine.  Not
to mention the question of what you DO with it... Is the CRC for compress's
use?  Then it becomes not-quite-compress.  Is it for human use?  Then how do
I recreate it to find out if the file is still intact? ...

, 5) LASTLY: I am not trying to criticize the ARC routines, rather i am trying
,    to offer an alternative which i feel will reduce the time for transmission
,    of files, as well as, providing us with portability.  COMPRESS, ARITH, 
,    UNSHAR and UUENCODE are all available at the source level.  COMPRESS and
,    ARITH have been tried in at least three different environments: UNIX (BSD),
,    VMS and PC/MS-DOS.
,    Remember, for those of us who are NOT using the NET at the expense of a
,    university, the cost of communication, and therefore the time required
,    to transmit a file, are VERY important.

I don't find 1200bps transmission to be a lot of fun to wait for, either...
but I take it that your basic argument is that compress makes smaller archives
than zoo or arc, which are therefore cheaper to transmit.

I don't see that the compression improvement is as significant as you imply
(and your statement about binary is completely at odds with all my experience).
The other strengths of the integrated packages offer a LOT of functionality,
some of which I would seek out even if there were no compression involved.

The biggest problem I see is that many news mailers compress everything
blindly, so that an already-compressed file gets bigger.  This would also be
true of a sufficiently random file, although I think most executables aren't
that random.  And this compress-and-be-damned behavior is not a strength of
the system, it's a weakness.  (Even compress will complain if its result is
bigger than its original; does the mailer ignore this, or are the net.gods
lying when they claim they're shipping bigger files because of the double
compression?)

ralf@b.gp.cs.cmu.edu (Ralf Brown) (05/04/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
}  3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
}     we are at the whims of whomever is currently supporting (or not supporting)
}     them.
Sources are available for ZOO and ARC.



-- 
{harvard,uunet,ucbvax}!b.gp.cs.cmu.edu!ralf -=-=- AT&T: (412)268-3053 (school) 
ARPA: RALF@B.GP.CS.CMU.EDU |"Tolerance means excusing the mistakes others make.
FIDO: Ralf Brown at 129/31 | Tact means not noticing them." --Arthur Schnitzler
BITnet: RALF%B.GP.CS.CMU.EDU@CMUCCVMA -=-=- DISCLAIMER? I claimed something?

wtr@moss.ATT.COM (05/04/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>
>1) COMPRESS is a text only compression routine.  It will not now, or ever,
>   help in the compression of binary files.
>

[I'm not sure if this will be construed as a flame,
 but, asbestos suit in hand, here goes!]

WHAT ARE YOU TALKING ABOUT!?!?!?!?

I have assumed that everyone has been talking about the program
COMPRESS v4.0 that was posted to comp.sources.???? late last year
(let's not get too picky about the dates ;-).

It was based upon a "modified Lempel-Ziv algorithm" as 
published in IEEE Computer by Terry A. Welch.  PD source
was (at least in part) written by Joe Orost.
(appologies to anyone unintentionally left out of the credits)

With the full sixteen-bit compression, it does a great job of
compressing (almost ;-) all files, binary and source.  Most
compression ratios are in the 50-60% range, occasionally
as high as 75%. (larger files seem to compress a little better)

I have no idea what program you are referring to when you are 
describing your 'compress' but it is certainly not the same program
that I run on my AT clone at home.

=====================================================================
Bill Rankin
Bell Labs, Whippany NJ
(201) 386-4154 (cornet 232)

email address:		...![ ihnp4 ulysses cbosgd allegra ]!moss!wtr
			...![ ihnp4 cbosgd akgua watmath  ]!clyde!wtr
=====================================================================

jeff@cullsj.UUCP (Jeffrey C. Fried) (05/05/88)

   I stand corrected.  Since Lem-Ziv was DESIGNED for text compression, and
the authors do not mention its use for binaries, i never considered using it.
I tried it on an executable under UNIX and obtained a good reduction, for 
reasons which are not apparent.  I'm sure that there are cases where this does
not work (like graphics files), but it does appear to work , and in this case
better than the current version of ARITH.
   However, my point was that for TEXT, COMPRESS does a better job then the
ARC programs with which i'm familiar.  Also, i did not know that source for
zoo was available - a consideration which i believe to be VERY important
since support usually comes best from those who use a product.
   I would like to thank those who took the time to correct my 
missunderstanding concerning the use of compression on the net, but i find
it just a bit difficult because of the tone used in communicating with me.
For those who suggested that i "do my homework" before posting something
to the net, i can only say that since the net is my ONLY contact with this
problem, and that the comp...d group is for discussions, i am in essence
"doing my homework".  I'm sorry if my attempt to add to the discussion has
caused anyone to feel that their precious time has been wasted, but i 
think that you're as wrong as you are rude.
  
   Humbly yours,

Jeff Fried                                 ...!ames!cullsj!jeff
Cullinet Software
2860 Zanker Road, Suite 206                Reality, what a concept!
San Jose, CA, 95134

cudcv@daisy.warwick.ac.uk (Rob McMahon) (05/05/88)

In article <292@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>The following tools are 
>available in source code format:  COMPRESS (Lem-Ziv text compressor), ARITH
>(arithmetic compression for binary), UUencode/decode.  Since all of these
>will run under a variety of environments (IBM-PC, AMIGA, ATARI, VMS, SYS5,
>BSD), why not make these the basis for communicating.

I hope we're talking about binary files here, in which case I don't care
because I'd never just take a binary from the net and run it on one of my
machines.  If you're talking about sources, I like to scan down, read the
README, check out the comments in main etc., before I even save it to disk.
If I get all the bits of a posting, tack them together, uudecode them, and
uncompress them, only to find it's of no use to me, I'm not going to be
amused.  I have this feeling that people aren't going to bother to send proper
introductory articles in plain text before the actual posting.

Rob
-- 
UUCP:   ...!mcvax!ukc!warwick!cudcv	PHONE:  +44 203 523037
JANET:  cudcv@uk.ac.warwick.cu          ARPA:   cudcv@cu.warwick.ac.uk
Rob McMahon, Computing Services, Warwick University, Coventry CV4 7AL, England

geoff@utstat.uucp (Geoff Collyer) (05/05/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>  1) COMPRESS is a text only compression routine.  It will not now, or ever,
>     help in the compression of binary files.

This is absolutely dead wrong.  compress compresses any kind of file,
and has been used to compress (and correctly uncompress!), for example,
graphics bit maps, sendmail configuration files :-), and tar archives
containing binaries.
-- 
Geoff Collyer	utzoo!utstat!geoff, utstat.toronto.{edu,cdn}!geoff

jbuck@epimass.EPI.COM (Joe Buck) (05/05/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>
>  1) COMPRESS is a text only compression routine.  It will not now, or ever,
>     help in the compression of binary files.

Most emphatically wrong.  compress works just fine on many types of
binary files.  It can give 90% or more compression on bitmap data,
and usually > 50% compression on Unix executable files.  About the
only type of file I know of that compress fails on consistently is
floating point data in binary format.  As long some strings of bytes
occur much more frequently than others (whether they represent
characters, opcodes, or grey levels) compress kicks ass.

-- 
- Joe Buck  {uunet,ucbvax,sun,<smart-site>}!epimass.epi.com!jbuck
	    Old Internet mailers: jbuck%epimass.epi.com@uunet.uu.net

Argue for your limitations and you get to keep them.  -- Richard Bach

campbell@maynard.BSW.COM (Larry Campbell) (05/05/88)

In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
<>>Just one thing that needs to be known -- PC's can do no more than 12-bit
<>>compression.  So if you are compressing your file from a UNIX system,
<>>you need to say comress -b12 filename .
<>
<>This myth has been repeated several times, so I felt it was necessary to
<>speak up.  PCs most certainly CAN do a 16 bit compress/uncompress.  ...

Only a subset of PCs can do 16-bit compress/uncompress.  Mine can't.
I'm running VENIX/86 2.0, which is basically V7;  the PCC-derived
C compiler has only the tiny and small memory models (exactly
corresponding to non-split and split PDP-11s, which also cannot
handle 16-bit compress).

So it is true that PCs with a C compiler that supports multiple data
segments can handle 16-bit compress, but that hardly encompasses all
PCs in the world.
-- 
Larry Campbell                                The Boston Software Works, Inc.
Internet: campbell@maynard.bsw.com          120 Fulton Street, Boston MA 02109
uucp: {husc6,mirror,think}!maynard!campbell         +1 617 367 6846

loci@csccat.UUCP (Chuck Brunow) (05/05/88)

	Let me point out one simple fact: source code is VERY MUCH 
	SMALLER than binaries.

paul@devon.UUCP (Paul Sutcliffe Jr.) (05/05/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
+---------
|   1) COMPRESS is a text only compression routine.  It will not now, or ever,
|      help in the compression of binary files.
+---------

This is absolute and complete Bull-Ka-Ka.

    # cp /bin/sh /tmp
    # cd /tmp
    # ls -l sh
    -rwx--x--t   1 root     root       37762 May  5 09:23 sh
    # compress -V -v sh
    $Header: compress.c,v 4.0 85/07/30 12:50:00 joe Release $
    Options: BITS = 16
    sh: Compression: 34.90% -- replaced with sh.Z
    # ls -l sh.Z
    -rwx--x--t   1 root     root       24582 May  5 09:23 sh.Z

Looks like you can compress binaries to me!  Granted, the compression
factor isn't as good as can be had with text files (I've seen as much
as 90% in text files with plenty of repeating characters), but it
*does* work on binaries.

- paul

-- 
Paul Sutcliffe, Jr.				  +------------------------+
						  | Know what I hate most? |
UUCP (smart): paul@devon.UUCP			  |  Rhetorical questions. |
UUCP (dumb):  ...rutgers!bpa!vu-vlsi!devon!paul   +------<Henry Camp>------+

feg@clyde.ATT.COM (Forrest Gehrke) (05/05/88)

In article <10712@steinmetz.ge.com>, davidsen@steinmetz.ge.com (William E. Davidsen Jr) writes:
> 
>   I would like to add a little fuel to the fires of "which archiver"
> discussion.  Use of the 'btoa' routine instead of uuencode would save
> 12% (!) on binary postings. This is a PD program, included in the
> compress package, and runs just fine on a PC.

Having tried this sometime back, I have often wondered why this approach
is not used by USENET.  It would save a lot of transmission time.

>   All the discussion of using PKARC to save 1-2% or not using it to save
> time for many of the people on the net seems pointless. We should use
> both (standard) arc and zoo formats, uuencode them, and save bandwidth
> by dropping this discussion. Hopefully Rahul will clarify this by edict.

Also an excellent suggestion.  We could quickly find out from experience
which archiver works out best through use.

BTW what is holding up Rahul from taking over as moderator?

Forrest Gehrke k2bt

jbuck@epimass.EPI.COM (Joe Buck) (05/05/88)

In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>
>   I stand corrected.  Since Lem-Ziv was DESIGNED for text compression, and
>the authors do not mention its use for binaries, i never considered using it.
>I tried it on an executable under UNIX and obtained a good reduction, for 
>reasons which are not apparent.  I'm sure that there are cases where this does
>not work (like graphics files), but it does appear to work , and in this case
>better than the current version of ARITH.

Jeff, Jeff, Jeff.  You're STILL putting your foot in your mouth.  :-)

A Unix file is just a stream of bytes, and so is an MS-DOS file
except that it has extra attributes as well.  Compress replaces byte
strings with codes whose lengths are between 9 and 16 bits.  It will
work well on any file in which some byte sequences are more common
than others.  An executable file consists of instructions, which, for
almost all processors are integral numbers of bytes, and some are
much more common than others.  So compress works fine, and will give
good compression for just about any executable file.  There are
several types of graphics files: bitmaps are HIGHLY compressible;
other types of files act like a program for an imaginary computer and
consist of byte codes, some much more common than others.  These
compress well also.

There are only three types of files I've ever given to compress that
haven't been reduced in size as a result: random binary data,
floating point binary data, and files that have already been
compressed.

-- 
- Joe Buck  {uunet,ucbvax,sun,<smart-site>}!epimass.epi.com!jbuck
	    Old Internet mailers: jbuck%epimass.epi.com@uunet.uu.net

Argue for your limitations and you get to keep them.  -- Richard Bach

tneff@dasys1.UUCP (Tom Neff) (05/06/88)

In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
> ... Also, i did not know that source for
>zoo was available - a consideration which i believe to be VERY important
>since support usually comes best from those who use a product.

The source for ARC is available too, and it's running on (for instance)
this Stride.  

Don't confuse the ARC standard with Phil Katz's PC-optimized clone PKARC.
Due to an assiduous sales job most PC sysops have the Katz thing, but it
ain't the original.  The "C" language real McCoy is slower on PC's but
more portable.

>For those who suggested that i "do my homework" before posting something
>to the net, i can only say that since the net is my ONLY contact with this
>problem, and that the comp...d group is for discussions, i am in essence
>"doing my homework".  

There is a school of thought, notably expressed in the cat.announce.newusers
material, than the Net is a place for authoritative answers and requests for
same, not for "homework" owing to the expense of carrying it all.  I try to
keep an open mind.  :-)

Not that your posting was anything to apologize for anyway...

-- 
Tom Neff			UUCP: ...!cmcl2!phri!dasys1!tneff
	"None of your toys	CIS: 76556,2536		MCI: TNEFF
	 will function..."	GEnie: TOMNEFF		BIX: are you kidding?

brianc@cognos.uucp (Brian Campbell) (05/06/88)

In article <696@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
> If you are (sigh) going to post binaries on Usenet, DO NOT compress
> them first.  Many Usenet sites use compress to pack up their news
> batches.  Compressing a compressed file makes it larger.

Maybe those Usenet sites should not use the -f (force) flag with
compress.  Every version I've used (Sun, XENIX and DOS) will not
replace the original if the compressed version would be larger.  Try
compressing a file twice using the -v (verbose) option and see what
happens.
-- 
Brian Campbell        uucp: decvax!utzoo!dciem!nrcaer!cognos!brianc
Cognos Incorporated   mail: POB 9707, 3755 Riverside Drive, Ottawa, K1G 3Z4
(613) 738-1440        fido: (613) 731-2945 300/1200, sysop@1:163/8

laba-5ac@web7f.berkeley.edu (Erik Talvola) (05/06/88)

In article <1082@maynard.BSW.COM> campbell@maynard.UUCP (Larry Campbell) writes:
<>In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
<><>>Just one thing that needs to be known -- PC's can do no more than 12-bit
<><>>compression.  So if you are compressing your file from a UNIX system,
<><>>you need to say comress -b12 filename .
<><>
<><>This myth has been repeated several times, so I felt it was necessary to
<><>speak up.  PCs most certainly CAN do a 16 bit compress/uncompress.  ...
<>
<>Only a subset of PCs can do 16-bit compress/uncompress.  Mine can't.
<>I'm running VENIX/86 2.0, which is basically V7;  the PCC-derived
<>C compiler has only the tiny and small memory models (exactly
<>corresponding to non-split and split PDP-11s, which also cannot
<>handle 16-bit compress).
<>
<>So it is true that PCs with a C compiler that supports multiple data
<>segments can handle 16-bit compress, but that hardly encompasses all
<>PCs in the world.
<>-- 

What's wrong with getting a 16-bit Compress executable file for the PC
which was compiled with a proper C compiler?  Then, you can run a 16-bit
compress on any PC.  You are right in that you may not be able to compile
it with all C compilers, but you can run the executable on any PC (as long
as you have ~500K free).

>Larry Campbell                                The Boston Software Works, Inc.
>Internet: campbell@maynard.bsw.com          120 Fulton Street, Boston MA 02109
>uucp: {husc6,mirror,think}!maynard!campbell         +1 617 367 6846

---------------------------------------------------
Erik Talvola                 erikt@zen.berkeley.edu

"...death is an acquired trait." -- Woody Allen
---------------------------------------------------

caf@omen.UUCP (Chuck Forsberg WA7KGX) (05/06/88)

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
:
:  1) COMPRESS is a text only compression routine.  It will not now, or ever,
:     help in the compression of binary files.
The 13 bit compression in zoo gets about 29% compresseing YAM.EXE.
:  2) ARITH is a more general compression routine using adaptive arithmetic 
:     coding.  It will compress binary files where there is redundancy, but
Please post it!
:  3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
:     we are at the whims of whomever is currently supporting (or not supporting)
:     them.
The sources to ZOO *are* available, in fact it was a copy of ZOO I compiled
for 386 Xenix that I used in the above micro-benchmark.
:  4) COMPRESS works faster and better on text files then the ARC routines
:     because they use 12 bit compression, where 13-bit (and more) are possible
:     under even the PC for COMPRESS (i've tried it on ans AT-clone).
Compress, ARC, PKARC, and ZOO all use forms of LZW compression, derived
from the original Unix compress program.
:  5) On the weak side, there is as yet, no CRC or checksum for any of these,
:     but adding it would be someithing i am willing to take responsibility
:     for should enough people decide they would like to take the approach
:     which i'm currently suggesting.
The lack of a CRC in compress is a serious weakness.  ZRC and ZOO include CRC.
:     Also, there no directory support provided with these tools.  They work
:     on only one file at a time.  This is also correctable since the source
:     is available.
ZOO has excellent directory support - full Unix pathnames are supported.

Again, please post the ARITH program.  It would be most interesting
if the memory requirements are small - like Huffman encoding instead
of LZW.
Newsgroups: comp.sources.d,comp.binaries.ibm.pc.d
Subject: Re: Standard for file transmission
Summary: 
Expires: 
References: <292@cullsj.UUCP> <55@psuhcx.psu.edu> <537@csccat.UUCP> <I> <would> <like> <to> <clear> <up> <a> <couple> <of> <notions> <that> <have> <been> <expressed> <over> <296@cullsj.UUCP>
Sender: 
Reply-To: caf@omen.UUCP (Chuck Forsberg WA7KGX)
Followup-To: 
Distribution: 
Organization: Omen Technology Inc, Portland Oregon
Keywords: protocol compression source

In article <296@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
:
:  1) COMPRESS is a text only compression routine.  It will not now, or ever,
:     help in the compression of binary files.
The 13 bit compression in zoo gets about 29% compresseing YAM.EXE.
:  2) ARITH is a more general compression routine using adaptive arithmetic 
:     coding.  It will compress binary files where there is redundancy, but
Please post it!
:  3) The source for ZOO, PKARC, and the others is NOT available.  Therefore
:     we are at the whims of whomever is currently supporting (or not supporting)
:     them.
The sources to ZOO *are* available, in fact it was a copy of ZOO I compiled
for 386 Xenix that I used in the above micro-benchmark.
:  4) COMPRESS works faster and better on text files then the ARC routines
:     because they use 12 bit compression, where 13-bit (and more) are possible
:     under even the PC for COMPRESS (i've tried it on ans AT-clone).
Compress, ARC, PKARC, and ZOO all use forms of LZW compression, derived
from the original Unix compress program.
:  5) On the weak side, there is as yet, no CRC or checksum for any of these,
:     but adding it would be someithing i am willing to take responsibility
:     for should enough people decide they would like to take the approach
:     which i'm currently suggesting.
The lack of a CRC in compress is a serious weakness.  ZRC and ZOO include CRC.
:     Also, there no directory support provided with these tools.  They work
:     on only one file at a time.  This is also correctable since the source
:     is available.
ZOO has excellent directory support - full Unix pathnames are supported.

Again, please post the ARITH program.  It would be most interesting
if the memory requirements are small - like Huffman encoding instead
of LZW.

mark@adec23.UUCP (Mark Salyzyn) (05/06/88)

I'm sorry, I don't care if the IBM-PC can handle better than 12 bit
compress! I run UNIX on a PDP 11/23 *NON SPLIT I/D MACHINE* and that allows
me to use 12 bit compress (however I have a 13 bit LZW pack routine
that was posted in 1983 that works fine). In order to read stuff that
is packed more than 12 bit LZW I had to rewrite compress to use disk
rather than memory. BOY IS IT SLOW. In the interest of compatibility
with ALL types of machines I suggest that we use 12 bit compress. This
is the most available compression bit selection. If not, then I am
going to extend my disk version to handle 17 bit compress, post something
usefull and watch you all squirm.

			G'day
-- Mark Salyzyn, mad at the world for advancing and leaving me behind

ephram@violet.berkeley.edu (05/06/88)

In article <1082@maynard.BSW.COM> campbell@maynard.UUCP (Larry Campbell) writes:
>In article <4740@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
><>>Just one thing that needs to be known -- PC's can do no more than 12-bit
><>>compression.  So if you are compressing your file from a UNIX system,
><>>you need to say comress -b12 filename .
><>
><>This myth has been repeated several times, so I felt it was necessary to
><>speak up.  PCs most certainly CAN do a 16 bit compress/uncompress.  ...
>
>Only a subset of PCs can do 16-bit compress/uncompress.  

Hasn't anyone ever heard of a disk drive?!?  multiple segments as a limitation?
How about writing temporary results to a disk file (random access)? RAM disk?

Now I must admitt I have never cracked open the code to compress/uncompress,
but it seems to me that using a disk drive as an intermediate result area is
a very viable work around.  I would rather sit and watch my disk spin for an
extra minute than watch the RD light on my modem work 10% more time.

I admit it is not elegant but when some one says "can not do" I must speak
up.

Ephram Cohen
ephram@violet.berkeley.edu

jpn@teddy.UUCP (John P. Nelson) (05/06/88)

>	Let me point out one simple fact: source code is VERY MUCH 
>	SMALLER than binaries.

This is not clear.

For small programs in a high-level compiled language (like C), this is
true:  This is because the small program pulls in the language run-time
library.  The source is much smaller than the resulting executable:
However, I would bet that the object file (before linking) would be
about the same size as the source (even WITH the symbol table and
relocation information).

Assembly language source usually run about 10 times larger than the
resulting executable.

Large C program (64k+) source usually runs two to three times larger
than the resulting executable.  Of course, I find source code more
valuable:  I can make changes to suit my environment, or I can port the
program to a different machine entirely.  And of course, with an
operating system like UNIX which runs on a plethora of machines,
source code is the only acceptable distribution mechanism.

Other languages have different source/binary size ratios.  Some
languages can generate a lot of code with a very small amount of
source.  However, most of the source code posted to USENET is C.
-- 
     john nelson

UUCP:            {decvax,mit-eddie}!genrad!teddy!jpn
ARPA (sort of):  talcott.harvard.edu!panda!teddy!jpn

rroot@edm.UUCP (uucp) (05/07/88)

From article <3980@killer.UUCP>, by chasm@killer.UUCP (Charles Marslett):
> In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes:
>> Just one thing that needs to be known -- PC's can do no more than 12-bit
>> compression.  ...
> Actually, I have sent several people copies of a minor mod to compress 4.0
> that works fine if you have the memory (requires about 350-400 K above DOS
There are still, however, people running on systems who'se compilers don't know
how to work with >64K.
These systems exist and have to be dealt with.

-- 
-------------
 Stephen Samuel 			Disclaimer: You betcha!
  {ihnp4,ubc-vision,seismo!mnetor,vax135}!alberta!edm!steve
  BITNET: USERZXCV@UQV-MTS

loci@csccat.UUCP (Chuck Brunow) (05/07/88)

In article <2096@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes:
>In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>>
>>   I stand corrected.  Since Lem-Ziv was DESIGNED for text compression, and
>>the authors do not mention its use for binaries, i never considered using it.
>>I tried it on an executable under UNIX and obtained a good reduction, for 
>>reasons which are not apparent.  I'm sure that there are cases where this does

	This is actually partially true. The first "compress" to appear
	on the net (several years ago) only worked on text files and
	dumped core on binary files. The reason you get good compressions
	on binary files is probably that they haven't been stripped of
	the relocation info. Strip them first and I doubt that the
	compression will be so good (otherwise, throw your optimizer
	into the bit bucket). Typical (large) text compression is about
	67%, whereas binaries are closer to 20%. (I use 16-bit compress).

>
>A Unix file is just a stream of bytes, and so is an MS-DOS file
>except that it has extra attributes as well.  Compress replaces byte
>strings with codes whose lengths are between 9 and 16 bits.  It will
>work well on any file in which some byte sequences are more common
>than others.  An executable file consists of instructions, which, for
>almost all processors are integral numbers of bytes, and some are
>much more common than others.  So compress works fine, and will give
>good compression for just about any executable file.  There are

	This is doubtful. There's a good description of the workings
	of LZW in the GIF docs (recently posted). Bytes aren't the
	key feature here, but rather sequences of repeated bytes
	which should be rare in an optimized executable (on Unix
	at least).

>several types of graphics files: bitmaps are HIGHLY compressible;

	If they have lots of blank space, or other repeated sequences.
	Otherwise, they can be very similar to executables: 10-20%.

>other types of files act like a program for an imaginary computer and
>consist of byte codes, some much more common than others.  These
>compress well also.

	You must mean Huffman coding. These comments are true in that
	case, not LZW.
>
>There are only three types of files I've ever given to compress that
>haven't been reduced in size as a result: random binary data,
>floating point binary data, and files that have already been
>compressed.
>
	The point being that there is little redundancy.
>--

wyle@solaris.UUCP (Mitchell Wyle) (05/07/88)

This discussion will bear fruit only if r$ or the backbone gurus
implement one of these schemes as a usenet standard, and distribute
sources or binaries packaged with tarmail or whichever scheme wins
this debate.   I vote for tarmail.  Let's get a standard accepted!
-- 
-Mitchell F. Wyle            wyle@ethz.uucp
Institut fuer Informatik     wyle%ifi.ethz.ch@relay.cs.net
ETH Zentrum                  
8092 Zuerich, Switzerland    +41 1 256-5237

NETOPRWA@NCSUVM.BITNET (Wayne Aiken) (05/08/88)

As far as packing method (ARC,ZOO,Compress, etc.) the only one that
I've ever had any problems with has been compress.  (640K IBM AT)
Perhaps I've never gotten the right executable, does anyone have a
12 or 13 bit compress guaranteed to run in at least 512K?

That aside, the other problem, especially with multi-part postings, is
that not all parts consistently make it to all sites, and when they do,
one or more parts have been truncated or otherwise mangled.  It would be
of great help if each uuencoded part had a trailing cut line and signature,
so I can tell if a file has been truncated.  The new uuencode for the PC by
Richard Marks (great job, Richard!) correctly skips the cut lines, and can
also extract directly from shar files.

One last thing....one of the great advantages of using ARC and ZOO files
is that they maintain an internal CRC value for each file.  Recently,
someone posted a uuencoded EXE file which I had to download twice before I
got it to work, and I'm still not 100% positive that there is not some garbled
spot hidden somewhere in that binary.  If the packing method doesn't
include a CRC, then it really should be calculated and included as part
of the header or documentation, so I can verify that the file is OK.


+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+           Announcing the all-weather, 100% bio-degradable    *    +
+               .                                                   +
+   *     .      .     S t a r F l e e t  BBS      *      .         +
+                                                         .         +
+       *         (919) 782-3095   PC-Pursuitable     *             +
+  .         *    24 hrs/day,  300/1200/2400 baud    .          *   +
+                                                                   +
+    WITH: Utilities, hints, and games for IBM, Apple // & MAC      +
+          USENET favorites, on-line games, message bases           +
+          BITNET access, largest joke database in NC region        +
+                                                                 * +
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

ford@elgar.UUCP (Ford Prefect ) (05/08/88)

In article <9644@agate.BERKELEY.EDU> laba-5ac@web7f.berkeley.edu.UUCP (Erik Talvola) writes:
>What's wrong with getting a 16-bit Compress executable file for the PC
>which was compiled with a proper C compiler?  Then, you can run a 16-bit
>compress on any PC.  You are right in that you may not be able to compile
>it with all C compilers, but you can run the executable on any PC (as long
>as you have ~500K free).

There are a few problems with this approach:

1)	Such a compiler has to exist for the operating system you are
	running.  Obviously, the author had his brain in Ms.Dos mode,
	which, since the article was cross-posted to
	comp.binaries.ibm-pc, is forgivable in this case.  But one of
	the articles that was being followed up to mentioned an O.S.
	that only supported 64k segments.  Compress just won't work
	in such an environment without major redisign (like keeping
	the arrays in a disk file :-).

2)	The executable you get must be for your CPU!  This is obvious,
	of course, but I keep detecting a definite ibm-pc-chauvanist
	state of mind in this discussion.  Don't forget that there
	are people who are still running unix on PDP-11's and proud
	of it!  The PDP-11 is very similar to the 8086 except that
	nobody does anything as kludgey as geferkin with the segment
	registers!  So the best you can get is 64k code, 64k data.

In other words, discussion of a standardized compression format must
take into account the existence of small machines.  And "PC" !=
"Intel Cpu".

Personally, I use 16-bit compress since I don't need to talk to such
small machines.  But if I need to post a binary to the net, I will
probably use 12-bit compress, because I've never heard of a machine
or compiler that couldn't run it.

					-=] Ford [=-

"Once there were parking lots,		(In Real Life:  Mike Ditto)
now it's a peaceful oasis.		ford%kenobi@crash.CTS.COM
This was a Pizza Hut,			...!sdcsvax!crash!kenobi!ford
now it's all covered with daisies." -- Talking Heads

leonard@bucket.UUCP (Leonard Erickson) (05/08/88)

In article <1082@maynard.BSW.COM> campbell@maynard.UUCP (Larry Campbell) writes:
<Only a subset of PCs can do 16-bit compress/uncompress.  Mine can't.
<I'm running VENIX/86 2.0, which is basically V7;  the PCC-derived
<C compiler has only the tiny and small memory models (exactly
<corresponding to non-split and split PDP-11s, which also cannot
<handle 16-bit compress).
<
<So it is true that PCs with a C compiler that supports multiple data
<segments can handle 16-bit compress, but that hardly encompasses all
<PCs in the world.

Larry, you are confusing being able to *compile* a program and being able
to *use* it! I don't have *any* kind of C compiler. But I can uncompress
stuff that was compressed on a Unix system on my PC. 

Some kind soul posted an msdos *binary* for compress a while back. All
you need is DOS and more than 512k of ram...

True, this places two limits on the people who are using the program:
1. they've got to be using MS-DOS. (since we are talking about comp.-
   binaries.ibm.pc any arguments that this is a serious restriction
   should be routed to /dev/null)
2. they have to have 640k (576 will probably work, but I haven't
   tried it). This *is* a problem, but even at current memory prices
   it isn't *too* serious. (Unless you have an AT whose memory is mapped
   as 512 dos/512 extended)

-- 
Leonard Erickson		...!tektronix!reed!percival!bucket!leonard
CIS: [70465,203]
"I used to be a hacker. Now I'm a 'microcomputer specialist'.
You know... I'd rather be a hacker."

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/10/88)

Since the discussion is on I'm posting atob and btoa to binaries.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/10/88)

In article <552@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes:
>
>	Let me point out one simple fact: source code is VERY MUCH 
>	SMALLER than binaries.

And another: not everybody has all compilers.  There have been postings
in MSC, Turbo C, Turbo Pascal, MASM, Fortran and COBOL (yecch) so far on
this group.  That's why we have a binary group.  Besides I wouldn't give
out source to some things which I can distribute as binary. 

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

jeff@cullsj.UUCP (Jeffrey C. Fried) (05/10/88)

   I recently acquired the ZOO executables from the net and found them to be
incompatible with ARC.  The UNIX ARC i received over the net is compatible
with ARC5.2.1 under DOS.  Has anyone else experienced this incompatibility?

jtara@m2-net.UUCP (Jon Tara) (05/10/88)

In article <3980@killer.UUCP>, chasm@killer.UUCP (Charles Marslett) writes:
> In article <55@psuhcx.psu.edu>, wcf@psuhcx.psu.edu (Bill Fenner) writes:
> > Just one thing that needs to be known -- PC's can do no more than 12-bit
> > compression.  ...
> 
> Actually, I have sent several people copies of a minor mod to compress 4.0

Funny, I ran compress 4.0 through the Microsoft 4.0 compiler using
large model, and I've been happily compressing and de-compressing
with 16 bits ever since.  Far as I can tell, it doesn't need any
changes, at least under MS/PC-DOS and Microsoft C.  It does need
a good chunk of memory, which most people should have, unless you're
a real TSR nut.

-- 
  jtara%m-net@umix.cc.umich.edu          ihnp4!dwon!m-net!jtara

 "You don't have to take this crap.  You don't have to sit back
  and relax." _Walls Come Tumbling Down_, The Style Council

rick@pcrat.UUCP (Rick Richardson) (05/10/88)

In article <679@omen.UUCP> caf@omen.UUCP (Chuck Forsberg WA7KGX) writes:
>Again, please post the ARITH program.  It would be most interesting
>if the memory requirements are small - like Huffman encoding instead
>of LZW.

In case ARITH never gets posted:  the complete article and program appeared
in ACM last year, in C.  I typed it in myself (and lost it later).  The
program, as published, runs a lot slower than compress and does not do
quite as good a job as compress.  It was better than "pack".  It is
very small, and uses little memory.

If you dig into the article (this from memory, I seem to have misplaced
the issue of ACM as well), the program separates the algorithm for
encoding into a model.   Two models are presented, one that just
uses a static letter frequency table (for text), and an adaptive model (for
binaries).  As I recall, the author pointed out that more sophisticated
adaptive algorithms could be used for better results.

After monkeying around with the program for an evening, and even trying
my own hand at a more sophisticated model, I shelved the program, with
nary a backup.  Since it was slower and less efficient than compress,
I think its usefullness is limited to those applications which are
sensitive to both program and data size, such as in a modem.

BTW, I heard some rumor that a 16 bit "uncompress"-only is available
for limited memory systems.  If this is true, then why all the fuss
about 16 bit compression?
-- 
		Rick Richardson, President, PC Research, Inc.

(201) 542-3734 (voice, nights)   OR     (201) 834-1378 (voice, days)
uunet!pcrat!rick (UUCP)			rick%pcrat.uucp@uunet.uu.net (INTERNET)

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/10/88)

In article <307@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:

|    I recently acquired the ZOO executables from the net and found them to be
| incompatible with ARC.

  Correct. zoo is not "another arc file program," it is a totally
separate file structure, containing information which neither arc or
pkarc include.

|               The UNIX ARC i received over the net is compatible
| with ARC5.2.1 under DOS.  Has anyone else experienced this incompatibility?  

  Alas, there is no "the" UNIX arc, there are a number of slightly
diferent versions. If you have the one I suspect, it needs the "-i"
option to be compatible with the DOS arc. I highly commend switching the
meaning of that flag for default DOS compatibility.

  Actually I highly commend using zoo...
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

tneff@dasys1.UUCP (Tom Neff) (05/10/88)

In article <307@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>   I recently acquired the ZOO executables from the net and found them to be
>incompatible with ARC.  The UNIX ARC i received over the net is compatible
>with ARC5.2.1 under DOS.  Has anyone else experienced this incompatibility?  

Yes, everyone has experienced this incompatibility Jeffrey, because
they are not SUPPOSED to be compatible!  :-)

ARC is one archiving standard, ZOO is a completely different standard.
You need one set of programs to create, list and extract ARC files, and
a different set to manipulate ZOO archives.  You can't use one with the
other.

Now, if your next question was going to be why there are two incompatible
archiving standards for the MSDOS/UNIX/VMS environment, you'll have to
ask our very own moderator Rahul, because there was only one (ARC) until
he decided to invent his own.  I told him at the time that user confusion
would result, but the argument is moot at this point.

-- 
Tom Neff			UUCP: ...!cmcl2!phri!dasys1!tneff
	"None of your toys	CIS: 76556,2536		MCI: TNEFF
	 will function..."	GEnie: TOMNEFF		BIX: are you kidding?

jeff@cullsj.UUCP (Jeffrey C. Fried) (05/11/88)

   Someone asked for the source to COMPRESS, and since i cannot reach
them except through posting to this group let me simply say that I have
the source, which was posted on the net several months ago.  I made a
small change which allows it to run in SMALL model addressing using MSC-5.0
to do 13 (yes 13, not 12) bit compression.  (Only one large array was
required.)
   If the person who wanted it will contact me with a UUCP address, i'll
send it out.  If there is a sufficient number of requests, i'll send it to
the moderator for posting to comp.binaries.ibm.pc.

pjh@mccc.UUCP (Pete Holsberg) (05/11/88)

In article <10770@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
...  Alas, there is no "the" UNIX arc, there are a number of slightly
...diferent versions. If you have the one I suspect, it needs the "-i"
...option to be compatible with the DOS arc. I highly commend switching the
                                                              ^^^^^^^^^^^^^
...meaning of that flag for default DOS compatibility.
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   
How do you do that?  Thanks.

mitch@stride1.UUCP (Thomas P. Mitchell) (05/12/88)

In article <10758@steinmetz.ge.com> davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes:
>In article <552@csccat.UUCP> loci@csccat.UUCP (Chuck Brunow) writes:
>>
>
>And another: not everybody has all compilers.  There have been postings
>in MSC, Turbo C, Turbo Pascal, MASM, Fortran and COBOL (yecch) so far on
>this group.  That's why we have a binary group.  Besides I wouldn't give
>out source to some things which I can distribute as binary. 

To send binary or not to send binary, this is a tough question.
But my general thoughts on this is that binarys are the way to
distribute a product you sell and support.  Source text is the
way to make available something you wish to share and are willing
to see improved and expanded.  Oh yes critized as well.

The argument that not evrybody has all compilers is real.  Yet I
dislike it.  To me compilers are like a key board,  a computer is
worth little without one.   

Back to the topic. Not all binary files are code so how do we
transfer binarys or anything else for that matter.  Some things like
bit maps (face server, fonts) and other data needs to be transfered from
machine to machine.  And at times code binarys as well (blush I
did say this).  My thought on this is that the link should know
how to best send the data.  In other words uucp should be
expanded to exchange abilities.  Consider an initial uucp
conection in which the programs exchange information like "have
compress16|compress12, have bta/atb, have kermit, have xmodem ,
have link_is_100%, have TeleBit, have never_talked, have
exchanged_compression_tables".  Given this type of information
the program then can select the best tool to get the best
effective transfer rate for the next conversation.

Well what say you all?

Thanks for the soap.
mitch@stride1.Stride.COM

Thomas P. Mitchell (mitch@stride1.Stride.COM)
Phone: (702)322-6868	TWX: 910-395-6073	FAX: (702)322-7975
MicroSage Computer Systems Inc.
Opinions expressed are probably mine.

danno@microsoft.UUCP (Dan Norton) (05/13/88)

In article <145@elgar.UUCP>, ford@elgar.UUCP (Ford Prefect ) writes:
> In article <9644@agate.BERKELEY.EDU> laba-5ac@web7f.berkeley.edu.UUCP (Erik Talvola) writes:
> >What's wrong with getting a 16-bit Compress executable file for the PC...
> 
> There are a few problems with this approach:
> 
> 1)	Such a compiler has to exist for the operating system you are
> 	running...
> 	                                               ... But one of
> 	the articles that was being followed up to mentioned an O.S.
> 	that only supported 64k segments.  Compress just won't work
> 	in such an environment without major redisign (like keeping
> 	the arrays in a disk file :-).

You are wrong.  In fact, such a compress exists, using memory only.
Several people, including myself, have been able to modify the standard
compress with little trouble, and it works just fine on IBM PC's.
.
.
.
.
.
.
.

linhart@topaz.rutgers.edu (Mike Threepoint) (05/13/88)

tneff@dasys1.UUCP (Tom Neff) writes:
-=> The source for ARC is available too, and it's running on (for instance)
-=> this Stride.  

>sigh<  But the only squash source I can find is in Pascal.  Speaking
of which...

-=> Don't confuse the ARC standard with Phil Katz's PC-optimized clone PKARC.
-=> Due to an assiduous sales job most PC sysops have the Katz thing, but it
-=> ain't the original.  The "C" language real McCoy is slower on PC's but
-=> more portable.

"Accept no imitations" should be reserved to sales jobs.  PKARC is
faster and compresses smaller, why wouldn't they use the Katz thing?
ARCE, NARC, and NSWEEP also support squashing, so it's not even
forcing PK(X)ARC on the users.

My bottom line is the archive size, speed is gravy unless it operates
as slow as... oh, I dunno... ARC?  :-)  On my BBS, my own experience
is that PKARC creates smaller archives than ZOO, so I use PKARC when I
don't need to store a directory subtree.  Squashing has saved over a
meg of space on my board.  Sometimes PKARC is stupid about compression
and squashes when it should crunch or crunches at a 0% compression
rate instead of storing, but most of the time it's smaller.

If ZOO crunched as well (>sigh<), I would use that.  [Selfish mode:
Maybe Rahul could find out what hashing algorithm PK or DWC is using
to get better compression rates.  Would simply things for me
considerably.]
-- 
"...billions and billions..."			| Mike Threepoint (D-ro 3)
			-- not Carl Sagan	| linhart@topaz.rutgers.edu
"...hundreds if not thousands..."		| FidoNet 1:107/513
			-- Pnews		| AT&T +1 (201)878-0937

loverso@encore.UUCP (John Robert LoVerso) (05/13/88)

In article <2932@cognos.UUCP> brianc@cognos.UUCP (Brian Campbell) writes:
> In article <696@fig.bbn.com> rsalz@bbn.com (Rich Salz) writes:
> > If you are (sigh) going to post binaries on Usenet, DO NOT compress
> > them first.  Many Usenet sites use compress to pack up their news
> > batches.  Compressing a compressed file makes it larger.
> 
> Maybe those Usenet sites should not use the -f (force) flag with compress.

Thats not how (typical) news batching works.  Compress is used as a stage
of a pipe:
	batch | compress | uux
and because compress doesn't know the size of its input when it starts
up, it will *always* produce compressed output.

The point is that an article which contains binary thats compressed and then
uuencode/btoa/your_favorite'd will lower the compression ration for the batch
that contains it.  The overall size of the batch will be smaller if the
included binary was just uuencoded, etc.

I no longer carry comp.binaries.* as I am using its disk space to store
more *useful* news.  It would be nice to see such things split out
into bin.*.

As gnu says: "Use the Source, Luke..."

John Robert LoVerso, Encore Computer Corp
encore!loverso, loverso@multimax.arpa

phil@amdcad.AMD.COM (Phil Ngai) (05/14/88)

In article <786@stride.Stride.COM> mitch@stride1.UUCP (Thomas P. Mitchell) writes:
>The argument that not evrybody has all compilers is real.  Yet I
>dislike it.  To me compilers are like a key board,  a computer is
>worth little without one.   

Well, I strongly disagree. I don't have source for most of the things
I run on this PC, nor do I want it. I don't have time to tinker with
source code, compiling it, fixing it. I want to get the program and
start using it. Of course, I use programs like PC-NFS, SCHEMA, ORCAD,
PSPICE, and other CAD type tools. You probably don't know what this
stuff is, so you can't appreciate that some people want to do useful
work *with* their computers instead working *on* the computer. 

-- 
Make Japan the 51st state!

I speak for myself, not the company.
Phil Ngai, {ucbvax,decwrl,allegra}!amdcad!phil or phil@amd.com

allbery@ncoast.UUCP (Brandon S. Allbery) (05/15/88)

As quoted from <8430@iuvax.cs.indiana.edu> by bobmon@iuvax.cs.indiana.edu (RAMontante):
+---------------
| The biggest problem I see is that many news mailers compress everything
| blindly, so that an already-compressed file gets bigger.  This would also be
| true of a sufficiently random file, although I think most executables aren't
| that random.  And this compress-and-be-damned behavior is not a strength of
| the system, it's a weakness.  (Even compress will complain if its result is
| bigger than its original; does the mailer ignore this, or are the net.gods
| lying when they claim they're shipping bigger files because of the double
| compression?)
+---------------

When compress is invoked as

		compress (file)

it complains.  When it's invoked as:

		sendbatch | compress | uux -r - oopsvax!rnews

it can't do so without compressing to a temp file while saving its input in
a second temp file, then comparing sizes and copying the smaller of the two:
wasteful of space and time.  (You can't, of course, seek backwards on a
pipe.)
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
	{well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery
Delphi: ALLBERY						     MCI Mail: BALLBERY

allbery@ncoast.UUCP (Brandon S. Allbery) (05/16/88)

As quoted from <563@csccat.UUCP> by loci@csccat.UUCP (Chuck Brunow):
+---------------
| In article <2096@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes:
| >In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
| >>   I stand corrected.  Since Lem-Ziv was DESIGNED for text compression, and
| >>the authors do not mention its use for binaries, i never considered using it.
| >>I tried it on an executable under UNIX and obtained a good reduction, for 
| >>reasons which are not apparent.  I'm sure that there are cases where this does
| 
| 	This is actually partially true. The first "compress" to appear
| 	on the net (several years ago) only worked on text files and
| 	dumped core on binary files. The reason you get good compressions
| 	on binary files is probably that they haven't been stripped of
| 	the relocation info. Strip them first and I doubt that the
| 	compression will be so good (otherwise, throw your optimizer
| 	into the bit bucket). Typical (large) text compression is about
| 	67%, whereas binaries are closer to 20%. (I use 16-bit compress).
+---------------

Wrong.  Consider that, for example, every call to putchar() contains some
fixed code (such as a call to _flsbuf()); this, on a 32-bit address space
machine, will always be the same byte sequence (on a 680x0, it's 6 bytes).
Other things will also be common:

	printf("format", non-double-value);

(which is by far the *most* common use of printf(), from what I've seen;
perhaps others have seen other more common calls) has the constant assembler
code on a 680x0:

		jsr	_printf			6 bytes
		addql	#8,a6			2 bytes

(and "printf("constant")", also common, is a slightly different 8-byte value).
These kind of extremely common operations can't be optimized out and are
quite amenable to compression.

RISC eecutables are likely to be even more amenable to compression, since
many operations will assemble into lengthy byte sequences -- many of which
will be partially or totally identical.

Ergo:  compression of executables generally works pretty well.  (I regularly
see 50%-60% on stripped, optimized executables on ncoast.)
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
	{well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery
Delphi: ALLBERY						     MCI Mail: BALLBERY

jpn@teddy.UUCP (John P. Nelson) (05/16/88)

>The point is that an article which contains binary thats compressed and then
>uuencode/btoa/your_favorite'd will lower the compression ration for the batch
>that contains it.  The overall size of the batch will be smaller if the
>included binary was just uuencoded, etc.

If this was TRUE, it would be a good argument.  It is NOT true.

Most binary files that are compressed, uuencoded, then compressed again
are SMALLER than binary files that are simply uuencoded, then
compressed.  I have yet to see anyone post results that refute this.

A few people have pointed out counter-examples:  These usually involve
compressing an ARC file (or other binary file with very little
compressability in the first place).  The few cases I have seen where
using ARC (which will NOT try to compress a file that is
uncompressable), followed by uuencode, followed by compress generates a
larger file than uuencode/compress alone, the file lengths were within
1% of each other.

If someone has seen different results, I would be interested in seeing
them.  I already KNOW that compressing ASCII files (source or text)
then uuencoding is a bad idea:  I am interested in results from BINARY
FILES only!  I think we should SETTLE this issue once and for all!
-- 
     john nelson

UUCP:            {decvax,mit-eddie}!genrad!teddy!jpn
ARPA (sort of):  talcott.harvard.edu!panda!teddy!jpn

dhesi@bsu-cs.UUCP (Rahul Dhesi) (05/17/88)

To avoid ambiguity, I suggest the following terminology.

     B = binary
     T = text
     U = uuencoding
     C16 = 16-bit LZW ("compress" default)
     C12 = 12-bit LZW (arc)
     C13 = 13-bit LZW (zoo, squashing)

So, instead of claiming that "uuencoded binary files compressed are
larger than not uuencoding" it is better to say that "BC12UC16 is worse
than BC16", or "BUC16U is worse than BC16" etc.

     BC12UC16 means:
       (B)   take a binary file
       (C12) compress using arc or 12-bit "compress"
       (U)   uuencode it
       (C16) compress using 16-bit "compress"

Also, since binary files differ, it's good to use some standard binary
file in benchmarks, e.g. your UNIX kernel stripped of symbols, so there
is some degree of consistency.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

loci@csccat.UUCP (Chuck Brunow) (05/18/88)

In article <3075@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>To avoid ambiguity, I suggest the following terminology.
>
>file in benchmarks, e.g. your UNIX kernel stripped of symbols, so there
>is some degree of consistency.
>-- 

	Using UNIX files as test cases misses the point. Unix sites
	rarely use binary as a mode of transfer. It would be reasonable
	to use MS-DOS files, as they represent the real thing.

	It is nearly universal that Unix sites use UUCP, in some form
	which will allow un-uue'd (pure binary) files. This is one
	reason why the question of compressing compressed files
	came up with in the first place: Text files can be batched
	and compressed to speed tranmission.

	Binary files are the fly in the ointment. Add the that the
	problem of handling assorted mailers for non-Unix sites
	and chaos begins. Embedded compression, (arc, gif, et al)
	complicate the situation even further.

	Numerous people have suggested FIDONET as a viable solution.
	Why not drop binaries from Unix sites and route them through
	FIDO? Then there is no problem of compressing compressed
	files.

mark@mamab.UUCP (Mark Woodruff) (05/21/88)

 > From: loci@csccat.UUCP (Chuck Brunow)
 > Date: 18 May 88 06:36:52 GMT
 > Message-ID: <629@csccat.UUCP>
 > Numerous people have suggested FIDONET as a viable solution.
 > Why not drop binaries from Unix sites and route them through FIDO?
 
Because FidoNet has no way of automatically redistributing, or even 
forwarding, files.
 
mark
 
Fido:  sysop@363/9
UUCP:  codas!rtmvax!mamab!mark
 


---
 * Origin: MaMaB--the Machine in Mark's Bedroom (Opus 1:363/9)
SEEN-BY: 363/9

--  

Reply via UUCP:  codas!tarpit!mamab!<user>
                 codas!rtmvax!mamab!<user>
Reply via FidoNet:  <user> at 1:363/9.0

allbery@ncoast.UUCP (Rich Garrett) (05/24/88)

As quoted from <4776@teddy.UUCP> by jpn@teddy.UUCP (John P. Nelson):
+---------------
| >The point is that an article which contains binary thats compressed and then
| >uuencode/btoa/your_favorite'd will lower the compression ration for the batch
| >that contains it.  The overall size of the batch will be smaller if the
| >included binary was just uuencoded, etc.
| 
| If this was TRUE, it would be a good argument.  It is NOT true.
| 
| Most binary files that are compressed, uuencoded, then compressed again
| are SMALLER than binary files that are simply uuencoded, then
| compressed.  I have yet to see anyone post results that refute this.
+---------------

Single files, yes.  But the quoted message above specifically says BATCHES.
Batches include messages of all kinds from multiple newsgroups; to verify
whether batch compression is reduced, we have to modify sendbatch to print
the compression ratio and then run sendbatch with both compressed and
uncompressed uuencodes to see which results in smaller batches.  (We also
need a non-destructive "test" mode for sendbatch to (a) insure that the
batches are otherwise identical and (b) not screw up news transmission.)
This would have to be done with a number of batches and the results averaged
in order to give us a reasonably accurate result.
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
	{well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery
Delphi: ALLBERY						     MCI Mail: BALLBERY