[net.nlang] Squeezing files.

zben@umd5.UUCP (06/21/85)

I added net.nlang to the group header, because we are getting into that area,
and because I though the nlang people might be interested in this discussion.

In article <789@turtlevax.UUCP> ken@turtlevax.UUCP (Ken Turkowski) writes:
>In article <1861@ukma.UUCP> sean@ukma.UUCP (Sean Casey) writes:
>>In article <784@turtlevax.UUCP> ken@turtlevax.UUCP (Ken Turkowski) writes:
>>>I think you should consider changing to Lempel-Ziv Compression (posted
>>>to the net as "compress", version 3.0), which normally gives 70%
>>>compression (30% of original size) to text.  The program is fast, and
>>>adapts to whatever type of data you give it, unlike static Huffman
>>>coding.  It usually produces 90% (!) compression on binary images.
>>
>>Lempel-Ziv doesn't do NEARLY that well.  We've been using it for
>>months, and we've found that text and program sources usually get about
>>55-65% compression, while binaries get about 45-55% compression.  
>
>I can see that we have a semantic problem here.  By "image", I mean a
>picture, or two-dimensional signal.  By "binary", I mean ones and
>zeros, black and white, no grey-scale, no color.  
>
>I'm curious; what is the etymology of the word "binary" as it is
>sometimes used to refer to executable machine code?  And why does it
>imply program rather than data?

I remember way back when IBM was the only game in town, they called the 
output decks produced by compilers "relocatable binaries".  The Univac  
system I grew up on has both "relocatable elements" and "absolute elements",
the latter sort of like "load modules" on current IBM systems, programs
linked and ready-to-run, but incapable of further modification.

So, Univac dropped the "binary" part.  It seems another branch in the
etymology of these beasts dropped the "relocatable" and just ended up with
"binary", on many systems there is not an "absolute" form, so the distinction
was not needed.

Now, "image", to my mind, implies something else entirely.  It implies a
strict one-for-one correspondance between words in-core and words in the file.
By this definition, neither the Univac (very tightly-packed format) nor the
usual Unix (because of BSS) implementations apply.  I understand on the old
TOPS-10 system a running program could write a copy of itself out to the file
system, which could then later be executed and pick up where it had started.
THIS qualifies as an "image".

Any takers?

-- 
Ben Cranston  ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben  zben@umd2.ARPA

jeff@rtech.UUCP (Jeff Lichtman) (06/28/85)

> >
> >I'm curious; what is the etymology of the word "binary" as it is
> >sometimes used to refer to executable machine code?  And why does it
> >imply program rather than data?
> 

Here's my guess.  There are many ways to represent numbers (as we all should
know).  Binary is a format that people have trouble reading, and strings of
the characters 0-9 are easy to read.  I believe that, by extension, the word
"binary" is applied to any non-human-readable data, especially when it is
stored in files.  The human-readable and non-human-readable forms of programs
(source and object or executable code) parallel the human-readable and
non-human-readable forms of numbers, so its easy to draw an analogy.
-- 
Jeff Lichtman at rtech (Relational Technology, Inc.)
aka Swazoo Koolak

{amdahl, sun}!rtech!jeff
{ucbvax, decvax}!mtxinu!rtech!jeff

mac@uvacs.UUCP (Alex Colvin) (06/30/85)

 >
 >I'm curious; what is the etymology of the word "binary" as it is
 >sometimes used to refer to executable machine code?  And why does it
 >imply program rather than data?

Probably because everything was stored on Hollerith cards (1 character/column),
except for code, which was stored on binary decks (12 bits/column).