byron@archone.tamu.edu (Byron Rakitzis) (04/08/91)
In article <1991Apr8.160133.22070@kcbbs.gen.nz> Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) writes: >#flame on > A few days ago I finally managed to get hold of the Feb'91 DDJ. There were >two things about the compression competition that disturbed me: > >1. The test data is known in advance. It is thus possible to write a >compression program which will compress it arbitrarily well. Since the output >of the program is known in advance, no information is being transmitted, thus >the data can be compressed to 0 bytes. In theory I could submit a program >which, for compression, creates a 0 output byte file; for decompression, it >copies data from a static internal buffer to the output file as fast as it can Just a comment here: last year, the Princeton CS department sponsored just such a competition, i.e., the assignment was to compress and uncompress a given dictionary. The competition was graded according to the sum of (1) the compressed dictionary file, and (2) the size of the decompressing executable. It was quite possible, and some even found it desirable to set (1) to zero and to simply submit a huge program which produced a given output. Not to say that this is a valid way of producing general-purpose compression algorithms (I think this is the point of the competition in DDJ? I can't really infer this from your article, and I have not read the journal myself) but it does have its place.
Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) (04/08/91)
#flame on A few days ago I finally managed to get hold of the Feb'91 DDJ. There were two things about the compression competition that disturbed me: 1. The test data is known in advance. It is thus possible to write a compression program which will compress it arbitrarily well. Since the output of the program is known in advance, no information is being transmitted, thus the data can be compressed to 0 bytes. In theory I could submit a program which, for compression, creates a 0 output byte file; for decompression, it copies data from a static internal buffer to the output file as fast as it can This would take away all the prizes for max.compression and fastest program. In practice this program will probably not be accepted....however it's stil possible to create, say, a dictionary-based compression program which contains an optimal dictionary (created after 2 weeks runtime on an IBM 3090) which compresses the test data far better than anything else in existence. Even without resorting to this sort of thing, the results will *still* be biased: Everyone will no doubt be using the supplied data files to test their compressors, thus unconsciously tuning their compressors to the particular dat being compressed. 2. The type of system the compressor will be run on (a 25Mhz 386 with 128K cache and 8MB RAM). This system will *not* determine who can create the best compressor. What it will determine is who has access to high-end PC hardware; who has mastered the intricacies of EMS/XMS programming; and only then who can write the best compressor. Most people won't have access to this kind of system - there'll be a lot of students using low-end (ie affordable) AT's and Mac SE's and the like with 640K or 1M of RAM, which means (unless they've achieved some amazing breakthrough in compression technology) they might as well not bother if they have to compete with finite-context compressors runnin in 6MB of RAM. The other problem with this kind of setup is somewhat similar: Lets say the winner is chosen, an order-5 PPM compressor which runs in 6MB of RAM. Everyon will look at it, think "Very nice", and then go back to using PKZIP, compress, and StuffIt.... (BTW I'm not writing this criticism out of "sour grapes" reasons: The hardwar requirements are no problem for me. They are, however, unreachable for virtually everyone else I know. This means I could submit a ho-hum compressor which, due to the fact that it has huge amounts of memory to run in, will outperform a much better compressor which runs in around 512K. My compressor may be completely impractical, but it would win under the given conditions....). #flame off Any comments, anyone? Peter. Peter_Gutmann@kcbbs.gen.nz || peter@nacjack.gen.nz || pgut1@cs.aukuni.ac.nz (In order of decreasing reliability) Warning! Something large, scaly, and with fangs a foot long lives between <yoursite> and <mysite>. Every now and then it kills and eats messages. If you don't receive a reply within a week, try resending...
d5kwedb@dtek.chalmers.se (Kristian Wedberg) (04/09/91)
In article <1991Apr8.160133.22070@kcbbs.gen.nz> Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) writes: >#flame on > A few days ago I finally managed to get hold of the Feb'91 DDJ. There were >two things about the compression competition that disturbed me: > >1. The test data is known in advance. It is thus possible to write a >compression program which will compress it arbitrarily well. Hello Peter! The rules in the competition reminded me of a joke about the validation of ADA compilers; to be certified a compiler had simply to recognize some 2500 programs, and output the correct code. General functionality wasn't specified... However, if you send for the test data, I think you'll only receive samples of text, binary, image files etc, not the actual test data ('test' is the ambiguous word in the article). Also, they only want the one disc, which hardly is enough for the real thing. But this could easily be resolved if a kind soul would post these files (together with the application form). >2. The type of system the compressor will be run on (a 25Mhz 386 with 128K >cache and 8MB RAM). This system will *not* determine who can create the best >compressor. What it will determine is who has access to high-end PC hardware; Here I agree with you: it should have been geared towards practical schemes, not theoretical ones (since it's more a PROGRAMMING contest than an algorithmic one). In a Unix-machine you won't have all those MBytes for yourself, and your compressor slowly kill your drives since most of it is paged out... kitte
sigma@jec302.its.rpi.edu (Kevin J Martin) (04/09/91)
Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) writes: >#flame on > A few days ago I finally managed to get hold of the Feb'91 DDJ. There were >two things about the compression competition that disturbed me: > >1. The test data is known in advance... [other valid complaints deleted] I just got back to looking at the article on arithmetic coding - specifically the listings in the back. The programs as presented are incomplete! The description in the article would lead one to believe that the given listings implement a functional 2nd order adaptive arithmetic coder. However, there are several files missing, such as bitio.c, bitio.h, coder.c, and coder.h. Luckily, the complete listings are apparently available on simtel20 and its mirroring sites. I haven't checked those listings yet, though. -- Kevin Martin sigma@rpi.edu
sigma@jec302.its.rpi.edu (Kevin J Martin) (04/09/91)
sigma@jec302.its.rpi.edu (Kevin J Martin) writes: >Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) writes: >>#flame on >> A few days ago I finally managed to get hold of the Feb'91 DDJ. There were >>two things about the compression competition that disturbed me: >> >>1. The test data is known in advance... >[other valid complaints deleted] > >Luckily, the complete listings are apparently available on simtel20 and its >mirroring sites. I haven't checked those listings yet, though. I've looked through that archive now. The arithmetic coding seems to all be there. They also offer the sample files for the compression contest. What I'd always found fishy about their offer to send them out on floppy disk was that the files are supposed to be a couple of megabytes each. Sure enough, what they hadn't made clear before is that the files they send you are small samples of the sort of thing you might have to compress. They do NOT apparently send you the actual test data! But just to confuse the issue a bit more, their listing of files they supply doesn't match up with the files they actually supply in this archive! I kind of wish they wouldn't be so haphazard with this sort of thing. -- Kevin Martin sigma@rpi.edu
brad@looking.on.ca (Brad Templeton) (04/09/91)
Actually the DDJ contest did allow you to submit a Unix program. However, it was silly to let people have the inputs in advance. I am sure that some user, or several, will submit a program that outputs one byte (there were 3 possible test files) and just copies over the files. (Harder to do this under DOS since you can't put megabytes in an executable) And I am sure the editors will say, "cute, but obviously not what we intended" and award the prize money to somebody else. C'est la geurre. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
sigma@sun.ipl.rpi.edu (Kevin Martin) (04/11/91)
byron@archone.tamu.edu (Byron Rakitzis) writes: >Just a comment here: last year, the Princeton CS department sponsored just >such a competition, i.e., the assignment was to compress and uncompress >a given dictionary. The competition was graded according to the sum of >(1) the compressed dictionary file, and (2) the size of the decompressing >executable. It was quite possible, and some even found it desirable to set >(1) to zero and to simply submit a huge program which produced a given output. I don't see how this follows. If the huge program submitted simply does a massive copy of the dictionary to output, what has been achieved? Nothing, because your executable would be at least as big as the original. Anyone who had written a program which did ANY degree of compression would beat your entry. Just simple RLE could make the dictionary smaller (maybe), or if there aren't any 8-bit characters, just change all the 8-bit codes to seven bits and assume a high zero. That's 12.5% compression right there, and if the dictionary is any decent size, the program will be much smaller than 1/8 the dictionary. >Not to say that this is a valid way of producing general-purpose compression >algorithms (I think this is the point of the competition in DDJ? I can't really >infer this from your article, and I have not read the journal myself) but it >does have its place. DDJ is trying to get general-purpose compression algorithms, although they don't make that clear. I don't really think something which requires 8Mb memory, even under Unix, is general-purpose. Something which fits in a 512K dataspace (640K MSDOS minus overhead) would be interesting indeed, unless it takes days to run. -- Kevin Martin sigma@ipl.rpi.edu
rdippold@cancun.qualcomm.com (Ron Dippold) (04/12/91)
In article <1991Apr09.012358.16300@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: >However, it was silly to let people have the inputs in advance. I am >sure that some user, or several, will submit a program that outputs >one byte (there were 3 possible test files) and just copies over the >files. (Harder to do this under DOS since you can't put megabytes in >an executable) No problem at all. Just use overlays. In fact, what you could easily do is have a 300 or so byte executable portion followed by umpteen megabytes of data. The loader will load just the executable part and away you go.