ken@turtlevax.UUCP (Ken Turkowski) (06/19/85)
In article <1861@ukma.UUCP> sean@ukma.UUCP (Sean Casey) writes: >In article <784@turtlevax.UUCP> ken@turtlevax.UUCP (Ken Turkowski) writes: >>I think you should consider changing to Lempel-Ziv Compression (posted >>to the net as "compress", version 3.0), which normally gives 70% >>compression (30% of original size) to text. The program is fast, and >>adapts to whatever type of data you give it, unlike static Huffman >>coding. It usually produces 90% (!) compression on binary images. > >WHOA BUDDY! > >Lempel-Ziv doesn't do NEARLY that well. We've been using it for >months, and we've found that text and program sources usually get about >55-65% compression, while binaries get about 45-55% compression. This >is encountered in the optimal case of compressing a large archive of >files. As files get smaller, expecially as they drop below about 8k in >size, compression worsens. I seriously doubt that most binaries contain >only 10% of unambiguous information, much less being compressable to >that size. I can see that we have a semantic problem here. By "image", I mean a picture, or two-dimensional signal. By "binary", I mean ones and zeros, black and white, no grey-scale, no color. A binary image is then a coarsely quantized picture, with lots of runs of zeros and ones. L-Z does exceptionally well on these type of data, and I will reiterate my claim of 90% average compression. As far a program source code and executable machine code, I get the same types of compression ratios as you. I'm curious; what is the etymology of the word "binary" as it is sometimes used to refer to executable machine code? And why does it imply program rather than data? -- Ken Turkowski @ CADLINC, Menlo Park, CA UUCP: {amd,decwrl,hplabs,nsc,seismo,spar}!turtlevax!ken ARPA: turtlevax!ken@DECWRL.ARPA
zben@umd5.UUCP (06/21/85)
I added net.nlang to the group header, because we are getting into that area, and because I though the nlang people might be interested in this discussion. In article <789@turtlevax.UUCP> ken@turtlevax.UUCP (Ken Turkowski) writes: >In article <1861@ukma.UUCP> sean@ukma.UUCP (Sean Casey) writes: >>In article <784@turtlevax.UUCP> ken@turtlevax.UUCP (Ken Turkowski) writes: >>>I think you should consider changing to Lempel-Ziv Compression (posted >>>to the net as "compress", version 3.0), which normally gives 70% >>>compression (30% of original size) to text. The program is fast, and >>>adapts to whatever type of data you give it, unlike static Huffman >>>coding. It usually produces 90% (!) compression on binary images. >> >>Lempel-Ziv doesn't do NEARLY that well. We've been using it for >>months, and we've found that text and program sources usually get about >>55-65% compression, while binaries get about 45-55% compression. > >I can see that we have a semantic problem here. By "image", I mean a >picture, or two-dimensional signal. By "binary", I mean ones and >zeros, black and white, no grey-scale, no color. > >I'm curious; what is the etymology of the word "binary" as it is >sometimes used to refer to executable machine code? And why does it >imply program rather than data? I remember way back when IBM was the only game in town, they called the output decks produced by compilers "relocatable binaries". The Univac system I grew up on has both "relocatable elements" and "absolute elements", the latter sort of like "load modules" on current IBM systems, programs linked and ready-to-run, but incapable of further modification. So, Univac dropped the "binary" part. It seems another branch in the etymology of these beasts dropped the "relocatable" and just ended up with "binary", on many systems there is not an "absolute" form, so the distinction was not needed. Now, "image", to my mind, implies something else entirely. It implies a strict one-for-one correspondance between words in-core and words in the file. By this definition, neither the Univac (very tightly-packed format) nor the usual Unix (because of BSS) implementations apply. I understand on the old TOPS-10 system a running program could write a copy of itself out to the file system, which could then later be executed and pick up where it had started. THIS qualifies as an "image". Any takers? -- Ben Cranston ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben zben@umd2.ARPA
jeff@rtech.UUCP (Jeff Lichtman) (06/28/85)
> > > >I'm curious; what is the etymology of the word "binary" as it is > >sometimes used to refer to executable machine code? And why does it > >imply program rather than data? > Here's my guess. There are many ways to represent numbers (as we all should know). Binary is a format that people have trouble reading, and strings of the characters 0-9 are easy to read. I believe that, by extension, the word "binary" is applied to any non-human-readable data, especially when it is stored in files. The human-readable and non-human-readable forms of programs (source and object or executable code) parallel the human-readable and non-human-readable forms of numbers, so its easy to draw an analogy. -- Jeff Lichtman at rtech (Relational Technology, Inc.) aka Swazoo Koolak {amdahl, sun}!rtech!jeff {ucbvax, decvax}!mtxinu!rtech!jeff
msp@ukc.UUCP (M.S.Parsons) (06/28/85)
In article <789@turtlevax.UUCP> ken@turtlevax.UUCP (Ken Turkowski) writes: >>.. >>Lempel-Ziv doesn't do NEARLY that well. We've been using it for >>months, and we've found that text and program sources usually get about >>55-65% compression, while binaries get about 45-55% compression. >>.. >I can see that we have a semantic problem here. By "image", I mean a >picture, or two-dimensional signal. By "binary", I mean ones and >zeros, black and white, no grey-scale, no color. >.. >I reiterate my claim of 90% average compression. >.. I agree with Ken: compress works brilliantly with binary IMAGES. It is certainly better than UCB compact. What's interesting is that it works well with the image as a raster, run-length or quadtree: the underlying structure seems to make little difference. --Mike.
mac@uvacs.UUCP (Alex Colvin) (06/30/85)
> >I'm curious; what is the etymology of the word "binary" as it is >sometimes used to refer to executable machine code? And why does it >imply program rather than data? Probably because everything was stored on Hollerith cards (1 character/column), except for code, which was stored on binary decks (12 bits/column).