[comp.sys.mac] compression thoughts

merlyn@ernie.Rosemount.COM (Brian Westley) (03/16/89)

btoa/atob is better than binhex, and btoa can be improved upon
slightly (about 1% smaller).  btoa encodes 4 bytes into 5 base-85
digits ('!' to 'u') plus 'x' for end of data, and 'z' for 4 bytes of zero.
It also add a newline every 78 chars to keep mailers happy.
About 80% of these newlines can be eliminated if, for each line, the
rightmost '!' is turned into a newline (unless this is the first character
in the line, or the second character and the first is '.'; otherwise
the mailers may get confused).  When uncoding, any newline which comes
before to 79th character is turned into '!'.  "newline" would be any
sequence of newlines/carriage returns, in case the file has gotten
double-spaced, translated, gaps inserted, etc.  '!' is chosen because
it is zero in base-85, and occurs most frequently.  It can be made to
appear even more frequently using base-94 ('!' to '}') and use '~' for
4 bytes of zero, and ' ' at the beginning of a line for end of data
(mailers may clip trailing spaces, but this is not a trailing space;
checksum data follows).  Also, put the ascii-unpacking into the
compression so it can do both at once.  Which is needed for...

An auto-unpacking init; it patches the open file routine, and when a
file is created which looks like a compressed or compressed & ASCIIfied file,
it "monitors" the data written, and starts unpacking any data that looks
valid.  Files are unpacked automagically as they are download.  Neat, huh.

If I have time, I'd do it, but there's a good chance I won't have time.
----
Merlyn LeRoy
PS: make sure the auto-unpacking init doesn't do it's thing when 
a file is being compressed (vs. downloaded).

jurjen@cwi.nl (Jurjen N.E. Bos) (03/17/89)

In article <7390@rosevax.Rosemount.COM> merlyn@ernie.rosemount.com writes:
>btoa/atob is better than binhex, and btoa can be improved upon
>slightly (about 1% smaller).  btoa encodes 4 bytes into 5 base-85
>digits ('!' to 'u') plus 'x' for end of data, and 'z' for 4 bytes of zero.
>It also add a newline every 78 chars to keep mailers happy.
>About 80% of these newlines can be eliminated if, for each line, the
Ok, you asked for it. btoa makes a file 25% longer, right?
An easy computation shows that, using the 94 ASCII characters from "!" to
"~", one can ultimalety get 100*(log 256)/(log 94)-100=22.052% loss.
Now, if you encode 9 bytes into 11 printable characters, you have only
22.222% loss, which is quite close to the theoretical limit.
Who is going to write a program doing this? :-) :-)
-- 
  -- Jurjen N.E. Bos (jurjen@cwi.nl)