force@aix01.aix.rpi.edu (Christopher Jon Pe) (03/25/91)
shaw@pegasus.com (Sandy Shaw) writes: >I am trying to get the maximum compression possible of the following >32 byte hex file(in ASC): >f3e9 ec5c 8bec ecdb ece9 ec12 ec3f ecec >0cbb 8bec 5cdb ecdb 5c9c bbec 8bdb 9cec >The best results I get are using the compact utility(adaptive Huffman). This >gives 21.88% compression or 7 bytes saved. Does anyone out there have a >scheme that can improve on this? Well, if it is a HEX file, why don't you try to compress it by taking two bytes and putting them into one byte (range 0-255)? -- Gently, he stood there, above it all. Only when he turned his steely-blue eyes upwards to the stars did he see the subtle purity that lay within the cosmos. Finally, he was at peace, for he realized that, although he was parted from his love by 300 miles, they both saw the same stars.
callahan@cs.jhu.edu (Paul Callahan) (03/25/91)
In article <1991Mar24.152106.6333@pegasus.com> shaw@pegasus.com (Sandy Shaw) writes: >I am trying to get the maximum compression possible of the following >32 byte hex file(in ASC): > >f3e9 ec5c 8bec ecdb ece9 ec12 ec3f ecec >0cbb 8bec 5cdb ecdb 5c9c bbec 8bdb 9cec I suspect that this isn't what you're looking for, but I figure SOMEBODY's got to say it. How about the following compression scheme: The bit string "0" represents the 32 byte sequence: f3e9 ec5c 8bec ecdb ece9 ec12 ec3f ecec 0cbb 8bec 5cdb ecdb 5c9c bbec 8bdb 9cec Any bit string beginning with a "1" represents the sequence of bits following the 1. Note: This is a compression scheme to the extent that it allows all bit strings to be represented. It also gives maximum compression for the given hex file, unless one is allowed to encode it as the empty string. It may seem that I'm just making a wisecrack, but I think there is a valid point here. That is, it's not immediately clear how one may ask this sort of compression question about a single string (as opposed to infinite families of strings) without leading to the trivial solution I have outlined above. Perhaps you are looking for the notion of Kolmogorov complexity, roughly the length of the shortest program that can produce a given string. For a single string, even this question can only be posed in an interesting manner for a given computer or Turing machine (i.e. what's to stop me from assuming my programming language has a primitive "!" that prints the above string?). -- Paul Callahan callahan@cs.jhu.edu
brad@looking.on.ca (Brad Templeton) (03/25/91)
I don't understand the problem of trying to get maximum compression of a specific 32 byte file. The problem of compressing a specific file of 32 bytes makes no sense. You either want to compress a general class of 32 byte files, because you have lots of them (in which case the answer may be to compress them in groups) or a very large specific file. Since any compression code will be more than 32 bytes long, the far simpler answer is just to store the file rather than the decompressor. Compression involves removing redundant information from a file. A 32 byte file by definition contains no more than 32 bytes of redundant information! If you want to phrase your question, "how do I compress 32 byte files that have the following properties?" then people can answer it. Of course the properties are needed, since if the file is random bits, it can't be compressed in the general case. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
vd09+@andrew.cmu.edu (Vincent M. Del Vecchio) (03/25/91)
Is there any point to this exercise? I suggest the following compression algorithm: If input is f3e9 ec5c 8bec ecdb ece9 ec12 ec3f ecec 0cbb 8bec 5cdb ecdb 5c9c bbec 8bdb 9cec, output a "1" bit. Otherwise, output a "0" bit and append the results of compression with your favorite compression algorithm. This will, of course, result in a theoretical compression ratio of 256:1, but there are a bunch of problems, too... -Vincent Del Vecchio vd09@andrew.cmu.edu
raymond@cosc.canterbury.ac.nz (cantva) (03/25/91)
From article <obvFxEe00UgK01dIQG@andrew.cmu.edu>, by vd09+@andrew.cmu.edu (Vincent M. Del Vecchio): > If input is f3e9 ec5c 8bec ecdb ece9 ec12 ec3f ecec 0cbb 8bec 5cdb ecdb 5c9c > bbec 8bdb 9cec, output a "1" bit. Otherwise, output a "0" bit and append the > results of compression with your favorite compression algorithm. This will, of > course, result in a theoretical compression ratio of 256:1, but there are a > bunch of problems, too... Like having to store the string (in some place or other) anyway. This is fine if you are just transmitting the string somewhere and you wish to minimise what you are sending but if you want to archive it this wont work. -- Raymond Wilson. email: raymond@cosc.canterbury.ac.nz snail: c/- Computer Science Department, University of Canterbury, New Zealand.
dik@cwi.nl (Dik T. Winter) (03/25/91)
> shaw@pegasus.com (Sandy Shaw) writes: > > >I am trying to get the maximum compression possible of the following > >32 byte hex file(in ASC): > > >f3e9 ec5c 8bec ecdb ece9 ec12 ec3f ecec > >0cbb 8bec 5cdb ecdb 5c9c bbec 8bdb 9cec > > >The best results I get are using the compact utility(adaptive Huffman). This > >gives 21.88% compression or 7 bytes saved. Does anyone out there have a > >scheme that can improve on this? > Well, if it is a HEX file, why don't you try to compress it by taking two > bytes and putting them into one byte (range 0-255)? But even if those are just 32 bytes, you can compress it to zero bytes with a general compression/decompression program (specialized for this case of course). (E.g. if the compressed file is empty the original were those 32 bytes; if the (only) byte is 0, the original was empty, otherwise it is output from compress. This expands slightly on empty files of course.) But I do not think that was the intention. So what is the real problem? -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
dvldbg@cs.umu.se (Daniel Brahneborg) (03/25/91)
In article <a+5fs#b@rpi.edu> force@aix01.aix.rpi.edu (Christopher Jon Pe) writes: >shaw@pegasus.com (Sandy Shaw) writes: > >>I am trying to get the maximum compression possible of the following >>32 byte hex file(in ASC): > >>f3e9 ec5c 8bec ecdb ece9 ec12 ec3f ecec >>0cbb 8bec 5cdb ecdb 5c9c bbec 8bdb 9cec > > Well, if it is a HEX file, why don't you try to compress it by taking two >bytes and putting them into one byte (range 0-255)? > Get serious! He said it was a 32-byte HEX file, not a 80-byte ASCII file. Where were you during the lessons about bits and bytes? /Basic
anthony@convex.csd.uwm.edu (Anthony J Stieber) (03/25/91)
In article <1991Mar24.152106.6333@pegasus.com> shaw@pegasus.com (Sandy Shaw) writes: >I am trying to get the maximum compression possible of the following >32 byte hex file(in ASC): > >f3e9 ec5c 8bec ecdb ece9 ec12 ec3f ecec >0cbb 8bec 5cdb ecdb 5c9c bbec 8bdb 9cec > First do some simple analysis of the data. There are 32 bytes, of these there are 11 unique bytes. Replace each data byte with a 4 bit pointer to one of the 11 bytes. Now instead of 256 (32*8) bits uncompressed data, or 200 (25) Hufmann compressed bits, there are 216 (11*8+32*4) bits, which isn't as good. Now take the table and compute the differences between the 11 bytes, starting with zero. There are 11 differences, each difference is less than 64 so they can be stored as 6 bit values. Thus the table can be compressed from 88 (11*8) bits down to 66 (11*6). Total bits is 194 (128+66). This compression is of course totally dependant on the distribution of the data, and is probably worthless in real life situations. Interestingly enough there are 11 unique bytes and 11 unique nybbles. As it happens even if there were only 4 unique nybbles that would still take 128 (64*2) bits to encode. Thus a nybble table would not generally be more efficient. However, what's the point? The code to do any sort of uncompression like this is going to be much bigger than the data. What are you actually trying to do? -- <-:(= Anthony Stieber anthony@csd4.csd.uwm.edu uwm!uwmcsd4!anthony
warwick@cs.uq.oz.au (Warwick Allison) (03/25/91)
In <1991Mar25.002730.11027@cs.umu.se> dvldbg@cs.umu.se (Daniel Brahneborg) writes: >> Well, if it is a HEX file, why don't you try to compress it by taking two >>bytes and putting them into one byte (range 0-255)? >> >Get serious! He said it was a 32-byte HEX file, not a 80-byte ASCII file. >Where were you during the lessons about bits and bytes? click. click. click. WOOOOOOOOMPH. Flame on. Gee, this group is off to a good start. What make the term "hex file" mean "binary file" ? A "hex file" isn't anything `defined in the literature'. I think the respondant was more likely to be right in interpretting the stupid question in a way which is at least slightly worth answering. Personally, I just saw the question and brushed it off. Whoever posted the question did not understand what they were asking, so it's no use responding via news. e-mail them and explain that the question is a pile of poop. As it turned out, we have prompted quite some discussion, and I guess we have established ground level information. Anyway, I'm off to ask a real question... FFFFFFFFFFfffffffffizzzzzzzssss s ss s... POP! Flame off. Warwick. -- _--_|\ warwick@cs.uq.oz.au / * <-- Computer Science Department, \_.--._/ University of Queensland, v AUSTRALIA.
gwyn@smoke.brl.mil (Doug Gwyn) (03/25/91)
In article <1991Mar24.152106.6333@pegasus.com> shaw@pegasus.com (Sandy Shaw) writes: >I am trying to get the maximum compression possible of the following >32 byte hex file(in ASC): Something I don't understand here -- surely it takes ZERO bits to represent a sequence that you already know in advance?
philip@beeblebrox.dle.dg.com (Philip Gladstone) (03/26/91)
>>>>> On 24 Mar 91 15:21:06 GMT, shaw@pegasus.com (Sandy Shaw) said:
Sandy> I am trying to get the maximum compression possible of the following
Sandy> 32 byte hex file(in ASC):
Sandy> f3e9 ec5c 8bec ecdb ece9 ec12 ec3f ecec
Sandy> 0cbb 8bec 5cdb ecdb 5c9c bbec 8bdb 9cec
Sandy> The best results I get are using the compact utility(adaptive Huffman). This
Sandy> gives 21.88% compression or 7 bytes saved. Does anyone out there have a
Sandy> scheme that can improve on this?
Yes:
I can compress this file down to 1 byte (one bit really) with the following
algorithm:
If file data is equal to "f3e9 .... 9cec" then output 0
else output 1 followed by file data.
The expansion function is obvious.
Another algorithm that I could use for the above data is:
if next byte is 'ec' then output 1 else output (0 followed by byte).
This results in 32 control bits + 20 bytes of data. This is 24 bytes
-- a saving of 8 bytes.
I suspect that this is not the sort of answer that you wanted. You
need to be more precise about the other data you want to compress. For
example, compression of the following data:
1 78 123 511 985 ...
or
3 45 67 999 ...
can be done in a number of ways. The information that the format of
the data is a sequence of non-decreasing integers seperated by
whitespace allows you to use a better compression algorithm. In many
applications for compression, a great deal is known about the data
before you start. This information should be used in designing the
algorithm. The various programs like compress, pack, arc, etc all make
assumptions about the data they are working on. When the assumptions
are incorrect, they act as expansion programs!
Philip
--
Philip Gladstone Dev Lab Europe, Data General, Cambridge, UK
Listen three eyes, don't you try and outweird me, I get
stranger things than you free with my breakfast cereal.
davidsen@sixhub.UUCP (Wm E. Davidsen Jr) (03/27/91)
I got the same 25 byte size with a compressor I'm playing with, which gives very good results on small files. What params did you feed compact to get the results? I tried to duplicate it and couldn't. Or have you haved compact? -- bill davidsen - davidsen@sixhub.uucp (uunet!crdgw1!sixhub!davidsen) sysop *IX BBS and Public Access UNIX moderator of comp.binaries.ibm.pc and 80386 mailing list "Stupidity, like virtue, is its own reward" -me
kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (03/27/91)
In article <PHILIP.91Mar25135706@beeblebrox.dle.dg.com> philip@beeblebrox.dle.dg.com (Philip Gladstone) writes: >>>>>> On 24 Mar 91 15:21:06 GMT, shaw@pegasus.com (Sandy Shaw) said: >Sandy> I am trying to get the maximum compression possible of the following >Sandy> 32 byte hex file(in ASC): >Sandy> f3e9 ec5c 8bec ecdb ece9 ec12 ec3f ecec >Sandy> 0cbb 8bec 5cdb ecdb 5c9c bbec 8bdb 9cec >Sandy> The best results I get are using the compact utility(adaptive Huffman). This >Sandy> gives 21.88% compression or 7 bytes saved. Does anyone out there have a >Sandy> scheme that can improve on this? >Yes: >I can compress this file down to 1 byte (one bit really) with the following >algorithm ... While I must have reservations similar to Philip regarding coming up with an encoding of a _single example_ rather than a _class_ of strings, it _is_ a fun intellectual exercise. To get the ball rolling (a bit faster), I've managed a compression of 30% using the appended program (provided you don't count the program as part of the encoding! :-). That is, almost 10 bytes are saved (9.4 to be exact). Tony Warnock at lanl has pointed out to me there are algorithms that turn bitstrings into shift-register machines. Essentially such machines require a very small number of parameters and might prove useful. -kym BTW, LZ can't compress the output file further, although this would seem simple to do. ====================program==================== #include <assert.h> #include <stdio.h> int bytes[32]; int nbr; int nbw; int pos; int lastpos; outnum(x) { putchar('0'+(x&1)), nbw++; x>>=1; putchar('0'+(x&1)), nbw++; x>>=1; if(x) putchar('0'+(x&1)), nbw++; } outbit(b){ if(b&1) outnum(pos-lastpos),lastpos=pos+1; ++pos; } main(){ int i,j; nbr=0; nbw=0; pos=0; lastpos=0; for(i=0;i<32;i++)scanf("%2x",&bytes[i]),nbr+=8; for(i=0;i<32;i++)printf("%x ",bytes[i]); putchar('\n'); for(i=0;i<32;i++){ unsigned diff = bytes[i]^0xec; outbit(diff); diff>>=1; outbit(diff); diff>>=1; outbit(diff); diff>>=1; outbit(diff); diff>>=1; outbit(diff); diff>>=1; outbit(diff); diff>>=1; outbit(diff); diff>>=1; outbit(diff); } putchar('\n'); printf("%d bits read; %d bits written; compression=%g\n", nbr,nbw,(double)nbw/nbr ); exit(0); } ====================log file==================== f3 e9 ec 5c 8b ec ec db ec e9 ec 12 ec 3f ec ec c bb 8b ec 5c db ec db 5c 9c bb ec 8b db 9c ec 000000000011101000010000000010010000001000010100110000000000000000001100010100000000001010100000010010100100000001000010000010000110010001000010000010101000000010010000010000110000 256 bits read; 180 bits written; compression=0.703125 ====================end end end====================
kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (03/27/91)
In article <1991Mar27.132216.8132@bingvaxu.cc.binghamton.edu> kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) writes: >I've managed a compression of 30% using the appended program >(provided you don't count the program as part of the encoding! :-). >That is, almost 10 bytes are saved (9.4 to be exact). BTW, my little formula indicates a compression of 56% (i.e. save almost 18 bytes) may be possible since p=.33 approximately. -kym
doctor@vlsi.polymtl.ca (doctor zero) (03/29/91)
In article <1991Mar24.152106.6333@pegasus.com> shaw@pegasus.com (Sandy Shaw) writes: >I am trying to get the maximum compression possible of the following >32 byte hex file(in ASC): > >f3e9 ec5c 8bec ecdb ece9 ec12 ec3f ecec >0cbb 8bec 5cdb ecdb 5c9c bbec 8bdb 9cec > >The best results I get are using the compact utility(adaptive Huffman). This >gives 21.88% compression or 7 bytes saved. Does anyone out there have a >scheme that can improve on this? > >Sandy Shaw >shaw@pegasus.com one way that *could* increase your compression is making your data 'more compressible'. Try xoring your bytes in a daisy chain way (first byte xor second byte -> first output byte) and then compress THAT output. this kind of pre-processing is at its best when you are trying to compress a checkerboard image. By XORing successive lines, you ultimately get a white picture plus something at the last line. I think so.. correct me if I'm wrong. paul -- | | Everybody, everybody | doctor@info.polymtl.ca | I don't know anobody else | "De toute facon, on est les meilleurs!" | Ride on time | | Strike it up - Black Box !