merlyn@ernie.Rosemount.COM (Brian Westley) (03/16/89)
btoa/atob is better than binhex, and btoa can be improved upon slightly (about 1% smaller). btoa encodes 4 bytes into 5 base-85 digits ('!' to 'u') plus 'x' for end of data, and 'z' for 4 bytes of zero. It also add a newline every 78 chars to keep mailers happy. About 80% of these newlines can be eliminated if, for each line, the rightmost '!' is turned into a newline (unless this is the first character in the line, or the second character and the first is '.'; otherwise the mailers may get confused). When uncoding, any newline which comes before to 79th character is turned into '!'. "newline" would be any sequence of newlines/carriage returns, in case the file has gotten double-spaced, translated, gaps inserted, etc. '!' is chosen because it is zero in base-85, and occurs most frequently. It can be made to appear even more frequently using base-94 ('!' to '}') and use '~' for 4 bytes of zero, and ' ' at the beginning of a line for end of data (mailers may clip trailing spaces, but this is not a trailing space; checksum data follows). Also, put the ascii-unpacking into the compression so it can do both at once. Which is needed for... An auto-unpacking init; it patches the open file routine, and when a file is created which looks like a compressed or compressed & ASCIIfied file, it "monitors" the data written, and starts unpacking any data that looks valid. Files are unpacked automagically as they are download. Neat, huh. If I have time, I'd do it, but there's a good chance I won't have time. ---- Merlyn LeRoy PS: make sure the auto-unpacking init doesn't do it's thing when a file is being compressed (vs. downloaded).
jurjen@cwi.nl (Jurjen N.E. Bos) (03/17/89)
In article <7390@rosevax.Rosemount.COM> merlyn@ernie.rosemount.com writes: >btoa/atob is better than binhex, and btoa can be improved upon >slightly (about 1% smaller). btoa encodes 4 bytes into 5 base-85 >digits ('!' to 'u') plus 'x' for end of data, and 'z' for 4 bytes of zero. >It also add a newline every 78 chars to keep mailers happy. >About 80% of these newlines can be eliminated if, for each line, the Ok, you asked for it. btoa makes a file 25% longer, right? An easy computation shows that, using the 94 ASCII characters from "!" to "~", one can ultimalety get 100*(log 256)/(log 94)-100=22.052% loss. Now, if you encode 9 bytes into 11 printable characters, you have only 22.222% loss, which is quite close to the theoretical limit. Who is going to write a program doing this? :-) :-) -- -- Jurjen N.E. Bos (jurjen@cwi.nl)