[comp.compression] DDJ compression competition FLAME

byron@archone.tamu.edu (Byron Rakitzis) (04/08/91)

In article <1991Apr8.160133.22070@kcbbs.gen.nz> Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) writes:
>#flame on  
>  A few days ago I finally managed to get hold of the Feb'91 DDJ.  There were 
>two things about the compression competition that disturbed me:  
>  
>1. The test data is known in advance.  It is thus possible to write a  
>compression program which will compress it arbitrarily well.  Since the output
>of the program is known in advance, no information is being transmitted, thus 
>the data can be compressed to 0 bytes.  In theory I could submit a program  
>which, for compression, creates a 0 output byte file; for decompression, it  
>copies data from a static internal buffer to the output file as fast as it can

Just a comment here: last year, the Princeton CS department sponsored just
such a competition, i.e., the assignment was to compress and uncompress
a given dictionary. The competition was graded according to the sum of
(1) the compressed dictionary file, and (2) the size of the decompressing
executable. It was quite possible, and some even found it desirable to set
(1) to zero and to simply submit a huge program which produced a given output.

Not to say that this is a valid way of producing general-purpose compression
algorithms (I think this is the point of the competition in DDJ? I can't really
infer this from your article, and I have not read the journal myself) but it
does have its place.

Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) (04/08/91)

#flame on  
  A few days ago I finally managed to get hold of the Feb'91 DDJ.  There were 
two things about the compression competition that disturbed me:  
  
1. The test data is known in advance.  It is thus possible to write a  
compression program which will compress it arbitrarily well.  Since the output
of the program is known in advance, no information is being transmitted, thus 
the data can be compressed to 0 bytes.  In theory I could submit a program  
which, for compression, creates a 0 output byte file; for decompression, it  
copies data from a static internal buffer to the output file as fast as it can
This would take away all the prizes for max.compression and fastest program.  
   In practice this program will probably not be accepted....however it's stil
possible to create, say, a dictionary-based compression program which contains
an optimal dictionary (created after 2 weeks runtime on an IBM 3090) which  
compresses the test data far better than anything else in existence.  Even  
without resorting to this sort of thing, the results will *still* be biased:  
Everyone will no doubt be using the supplied data files to test their  
compressors, thus unconsciously tuning their compressors to the particular dat
being compressed.  
  
2. The type of system the compressor will be run on (a 25Mhz 386 with 128K  
cache and 8MB RAM).  This system will *not* determine who can create the best 
compressor.  What it will determine is who has access to high-end PC hardware;
who has mastered the intricacies of EMS/XMS programming; and only then who can
write the best compressor.  Most people won't have access to this kind of  
system - there'll be a lot of students using low-end (ie affordable) AT's and 
Mac SE's and the like with 640K or 1M of RAM, which means (unless they've  
achieved some amazing breakthrough in compression technology) they might as  
well not bother if they have to compete with finite-context compressors runnin
in 6MB of RAM.  
  The other problem with this kind of setup is somewhat similar:  Lets say the
winner is chosen, an order-5 PPM compressor which runs in 6MB of RAM.  Everyon
will look at it, think "Very nice", and then go back to using PKZIP, compress,
and StuffIt....  
  
(BTW I'm not writing this criticism out of "sour grapes" reasons:  The hardwar
requirements are no problem for me.  They are, however, unreachable for  
virtually everyone else I know.  This means I could submit a ho-hum compressor
which, due to the fact that it has huge amounts of memory to run in, will  
outperform a much better compressor which runs in around 512K.  My compressor 
may be completely impractical, but it would win under the given  
conditions....).  
#flame off  
  
Any comments, anyone?  
  
Peter.  
  
 Peter_Gutmann@kcbbs.gen.nz || peter@nacjack.gen.nz || pgut1@cs.aukuni.ac.nz  
                     (In order of decreasing reliability)  
Warning!  
  Something large, scaly, and with fangs a foot long lives between <yoursite> 
and <mysite>.  Every now and then it kills and eats messages.  If you don't  
receive a reply within a week, try resending...

d5kwedb@dtek.chalmers.se (Kristian Wedberg) (04/09/91)

In article <1991Apr8.160133.22070@kcbbs.gen.nz> Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) writes:
>#flame on  
>  A few days ago I finally managed to get hold of the Feb'91 DDJ.  There were 
>two things about the compression competition that disturbed me:  
>  
>1. The test data is known in advance.  It is thus possible to write a  
>compression program which will compress it arbitrarily well.  

Hello Peter! The rules in the competition reminded me of a joke about
the validation of ADA compilers; to be certified a compiler had simply to
recognize some 2500 programs, and output the correct code. General
functionality wasn't specified...

However, if you send for the test data, I think you'll only receive samples
of text, binary, image files etc, not the actual test data ('test' is the
ambiguous word in the article). Also, they only want the one disc, which
hardly is enough for the real thing.
But this could easily be resolved if a kind
soul would post these files (together with the application form).

>2. The type of system the compressor will be run on (a 25Mhz 386 with 128K  
>cache and 8MB RAM).  This system will *not* determine who can create the best 
>compressor.  What it will determine is who has access to high-end PC hardware;

Here I agree with you: it should have been geared towards practical schemes,
not theoretical ones (since it's more a PROGRAMMING contest than an
algorithmic one). In a Unix-machine you won't have all those MBytes
for yourself, and your compressor slowly kill your drives since most of it
is paged out...

	kitte

sigma@jec302.its.rpi.edu (Kevin J Martin) (04/09/91)

Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) writes:
>#flame on  
>  A few days ago I finally managed to get hold of the Feb'91 DDJ.  There were 
>two things about the compression competition that disturbed me:  
>  
>1. The test data is known in advance...
[other valid complaints deleted]

I just got back to looking at the article on arithmetic coding -
specifically the listings in the back.  The programs as presented are
incomplete!  The description in the article would lead one to believe that
the given listings implement a functional 2nd order adaptive arithmetic
coder.  However, there are several files missing, such as bitio.c, bitio.h,
coder.c, and coder.h.

Luckily, the complete listings are apparently available on simtel20 and its
mirroring sites.  I haven't checked those listings yet, though.

-- 
Kevin Martin
sigma@rpi.edu

sigma@jec302.its.rpi.edu (Kevin J Martin) (04/09/91)

sigma@jec302.its.rpi.edu (Kevin J Martin) writes:
>Peter_Gutmann@kcbbs.gen.nz (Peter Gutmann) writes:
>>#flame on  
>>  A few days ago I finally managed to get hold of the Feb'91 DDJ.  There were 
>>two things about the compression competition that disturbed me:  
>>  
>>1. The test data is known in advance...
>[other valid complaints deleted]
>
>Luckily, the complete listings are apparently available on simtel20 and its
>mirroring sites.  I haven't checked those listings yet, though.

I've looked through that archive now.  The arithmetic coding seems to all
be there.  They also offer the sample files for the compression contest.
What I'd always found fishy about their offer to send them out on floppy
disk was that the files are supposed to be a couple of megabytes each.
Sure enough, what they hadn't made clear before is that the files they send
you are small samples of the sort of thing you might have to compress.
They do NOT apparently send you the actual test data!  But just to confuse
the issue a bit more, their listing of files they supply doesn't match up
with the files they actually supply in this archive!

I kind of wish they wouldn't be so haphazard with this sort of thing.

-- 
Kevin Martin
sigma@rpi.edu

brad@looking.on.ca (Brad Templeton) (04/09/91)

Actually the DDJ contest did allow you to submit a Unix program.

However, it was silly to let people have the inputs in advance.  I am
sure that some user, or several, will submit a program that outputs
one byte (there were 3 possible test files) and just copies over the
files.   (Harder to do this under DOS since you can't put megabytes in
an executable)

And I am sure the editors will say, "cute, but obviously not what we
intended" and award the prize money to somebody else.

C'est la geurre.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

sigma@sun.ipl.rpi.edu (Kevin Martin) (04/11/91)

byron@archone.tamu.edu (Byron Rakitzis) writes:
>Just a comment here: last year, the Princeton CS department sponsored just
>such a competition, i.e., the assignment was to compress and uncompress
>a given dictionary. The competition was graded according to the sum of
>(1) the compressed dictionary file, and (2) the size of the decompressing
>executable. It was quite possible, and some even found it desirable to set
>(1) to zero and to simply submit a huge program which produced a given output.

I don't see how this follows.  If the huge program submitted simply does a
massive copy of the dictionary to output, what has been achieved?  Nothing,
because your executable would be at least as big as the original.  Anyone
who had written a program which did ANY degree of compression would beat
your entry.  Just simple RLE could make the dictionary smaller (maybe), or
if there aren't any 8-bit characters, just change all the 8-bit codes to
seven bits and assume a high zero.  That's 12.5% compression right there,
and if the dictionary is any decent size, the program will be much smaller
than 1/8 the dictionary.

>Not to say that this is a valid way of producing general-purpose compression
>algorithms (I think this is the point of the competition in DDJ? I can't really
>infer this from your article, and I have not read the journal myself) but it
>does have its place.

DDJ is trying to get general-purpose compression algorithms, although they
don't make that clear.  I don't really think something which requires 8Mb
memory, even under Unix, is general-purpose.  Something which fits in a
512K dataspace (640K MSDOS minus overhead) would be interesting indeed,
unless it takes days to run.
-- 
Kevin Martin
sigma@ipl.rpi.edu

rdippold@cancun.qualcomm.com (Ron Dippold) (04/12/91)

In article <1991Apr09.012358.16300@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
>However, it was silly to let people have the inputs in advance.  I am
>sure that some user, or several, will submit a program that outputs
>one byte (there were 3 possible test files) and just copies over the
>files.   (Harder to do this under DOS since you can't put megabytes in
>an executable)

No problem at all.  Just use overlays.  In fact, what you could easily do
is have a 300 or so byte executable portion followed by umpteen megabytes
of data.  The loader will load just the executable part and away you go.