[comp.compression] 1) decompression speeds, 2) rounding.

ross@spam.ua.oz.au (Ross Williams) (06/27/91)

Two minor notes on data compression details:

Decompression Times
-------------------
When quoting speeds of decompression, is everyone agreed that the
speed should be given relative to the decompressed data? That is, if I
say that my decompression algorithm goes at 50K/second, does everyone
agree that I should mean that it is WRITING 50K/s, not READING
50K/second.

Averages
--------
I recently polished my algorithm test harness on my macintosh and ran
up against a problem with averaging.

My test harness runs the algorithm under test on a suite of files
(e.g. the calgary corpus). For each file, it records the compression
performance (e.g. 3.48 bits per byte) in a floating point variable. It
also prints out the variable to two decimal paces (as seems to have
become standard).

The problem arises when I get to the end of the corpus and want to
print out the average of the compression performances. The question
is: should I print out the average of the internal, highly accurate
floating point numbers or should I print out the average of the
printed results rounded to 2dp.

I did some analysis and worked out that the "deep" average (the
rounding of the average of the unrounded values) and the "shallow"
average (the rounding of the average of the rounded values) can differ
by up to one rounded digit unit value (Example: try 1.5 and 2.5
rounding to 1dp).

Choosing the deep average means that if anyone attacks my statistics
with a calculator at a later date, they might conclude that I made a
mistake. Choosing the shallow average means losing information within
the specified rounding range unneccessarily.

This problem must arise in science all the time. Is there an accepted
convention? When pressed, I ended up implementing the shallow (average
the rounded results) average to avoid people thinking I made a
mistake. (This implementation was made difficult by a rounding error
in the printf in my C library!!!)

Ross Williams
ross@spam.ua.oz.au

cthombor@umnd-cs-gw.d.umn.edu (Clark Thomborson) (06/27/91)

As Ross guessed, the question of how to ``average'' a series of performance
figures has indeed been studied a lot.

My reply is based on Chapters 1 and 2 of Hennessy & Patterson's new text,
{\it Computer Architecture: A Quantitative Approach}, Morgan-Kaufmann, 1990.
This book has lots of great stuff in it, beyond its discussion of
performance averaging, including a chapter on what I expect is the next
``standard'' workstation architecture, a RISC cpu with a Cray-like
floating-point pipe.  No, I am not getting a kickback on their sales!

In brief, the hard part of performance averaging is to figure out
the application.  It makes sense to report the total time required to
process a representative benchmark dataset.  Perhaps this would be a
directory with a representative mix of files.  This view lays bare the
inherent problem in interpreting ``average performance'': if you measure
your program's runtime on a directory that contains a lot of C source files,
and my intended use is for decompressing a single binary boot file, your
performance figure is of little use to me.

I disagree with Hennessy & Patterson on relatively minor points,
in that I think one should report mins and maxes, as well as averages.
I also like speed figures (Kbytes/second in the data compression business,
MIPS in the architecture business) more than they seem to: I can interpret
speeds more or less directly into a *rough* runtime estimate for my own
application, but I can't always figure out the appropriate scaling factor
to convert their ``total time'' into a runtime estimate for my application.
For example, if you reported the range of decompression speeds you observed
on various types of files, I would probably be able to decide if your
compressor is fast enough for my binary boot file application.  If, instead,
you reported just a total time (or an average speed) for compressing a
directory with a ``representative mix'' of binaries, text files, and
picture files, I would be worried.  Perhaps the binaries ran a lot faster
or slower than the others (for some reason).  In this case, the total time
or average speed would not tell me much about my application.  If all
the files run at about the same speed, reporting mins and maxes will
reveal this; totals and averages will leave me in the dark.

Hennessy and Patterson point out that ``total time'' has a significant
advantage: it's relatively hard to cheat on its definition.  For example,
the MIPS rating of a cpu has become discredited, due to the widespread
practice of choosing ones' benchmark to maximize the reported figure.
In your context, reporting total time would dispose of the question of
whether one should measure decompression speed as time/input_size or
time/output_size.  I would argue that you should still report speeds,
but *clearly state your definition*.  You should also report your runtime
conditions: cpu, clock speed, computer manufacturer and model number,
operating system name and version number, compiler name and version number,
compiler optimization setting, etc.  If you are measuring compression
performance on a directory full of files, the IO configuration of your
system becomes important: at a minimum, you should disclose whether
you are using a floppy disk or a hard disk, but the disk model number
and its bus type (SCSI? ESDI?) are also important.

As for your question on ``deep'' versus ``shallow'' averages, the accepted
practice is to carry out all calculations at high precision, then round
just before printing.  As you noticed, this means your published data will
not always look consistent, although it does mean that every number is as
accurately reported as possible.  The only exception I have noticed to
this rule is that an accountant will sometimes fudge the rounding of a
number, so that the printed (and accurately rounded) total is consistent
with the sum of the (perhaps inaccurately rounded) terms.

							Clark

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/28/91)

The ``average speed'' taken by a compressor over different types of
input files is meaningless. Do you know anybody who compresses exactly
as many bytes of ``book1'' per day as of ``obj1'', and exactly as many
of ``news'' and ``trans'' and ``progp''? I don't. (Worse than that, it's
a harmonic average, so even if someone *does* compress exactly as much
of each type of file, the ``average speed'' will not equal his real
speed. ``Average speed'' means if you spend the same amount of *time* on
each type of text, how much do you get compressed overall. Ugh.)

Most people deal with some combination of ``book1'' and ``progc''
(archives) and ``news''. Their average will be very different from the
overall average on any given text corpus. The *only* meaningful
statements you can make are about the boundary cases: ``The worst
observed speed from this compressor is 30K/sec, the best observed speed
is 60K/sec,'' and then how it does on each file type.

In fact, one excellent way to display benchmarks with varying parameters
(the parameter in this case being the input file type) is the following:
(1) Select the benchmarks to be displayed: in this case, compression
effectiveness and compression time, on a two-dimensional chart. (2) For
each value of the parameter, plot a labelled point at the result. (3)
Draw the convex hull of the points; i.e., connect the outer dots. Repeat
for each compression method under consideration.

Now what do the statistics mean on this chart? Each point reflects a
result for one method on one file type. The top of the convex hull for
one method is the maximum speed; the bottom is the minimum speed; the
left is the maximum effectiveness; the right is the minimum
effectiveness. The ``spread'' of the convex hull, i.e., its area, shows
how sensitive the method is to the file type.

No matter what kind of files someone compresses, his results will always
be a convex combination of some of those outer points. Plotting the
``average'' speed or ``average'' effectiveness means that you're taking
the center of mass of the points---as if everyone compressed just as
much of every kind of file---and ignoring the interesting regions on the
edge of the convex hull.

End of sermon A. Begin sermon B.

Even without meaningless averages there's way too much information for
one person to take in all at once. Another dose of reality: People don't
really want to get some abstract numeric measure of ``how good'' a
compressor is on all types of text. They know beforehand that they're
only interested in news, say, or progc.

SO SHOW THE RESULTS FOR *ALL* COMPRESSORS ON *ONE* FILE IN *ONE* PLACE!

Someone who's deciding what to do with his newsfeed wants to see a
single row or column showing how *every* compressor does on a sample of
news. Witness this excerpt from my Y coding paper:

     __bib__book1__book2____geo_makelog___news___obj1___obj2_paper1_paper2_
Z12  54094 394419 325147  78026   34757 230765  16528 160099  29433  40881
Z14  46817 357387 281354  77696   25647 202594  14048 138521  25077  37196
Z16__46528_332056_250759__77777___25647_182121__14048_128659__25077__36161_
MW___41040_336793_230862__80296____8299_168652__13273_109266__22749__34234_
AP2  47056 389702 297205  84582   20925 219665  13824 134547  26937  39415
AP6  40770 338046 261270  79471   14762 190502  13824 123323  22413  34637
AP3__40311_322178_228978__80106____8821_167896__13825_113296__22414__33320_
Y2   46882 363339 287110  80817   28159 212617  13858 141783  26131  38037
Y6   40874 320622 256578  76275   21220 185097  13858 125900  22452  33671
Y3___40456_306813_229851__76695___14411_168287__13859_114323__22453__32733_

See how easy it is to choose a news compressor? You can see immediately
that MW (squeeze, 300000-string table), Y3 (yabba -m300000), and AP3
(whap -m300000) have approximately the same effectiveness, while Z16
(compress -b16) is only slightly better than Y6 (yabba -m65533). This
table could be extended to any length for any number of compressors and
it would still be usable. You could even add compression and
decompression speeds, rankings between the compressors, etc. What's
important is that all the information for a single file type be
collected in one place.

Now does anyone really see a need for average compressor speeds or
average performance?

In article <894@spam.ua.oz> ross@spam.ua.oz.au (Ross Williams) writes:
> When quoting speeds of decompression, is everyone agreed that the
> speed should be given relative to the decompressed data?

Yes. Otherwise the number cannot be compared directly to the compression
speed. Also, speed is a measure independent of the actual compression
method; speed per compressed byte is not.

> For each file, it records the compression
> performance (e.g. 3.48 bits per byte) in a floating point variable.

Why do you persist in that ugly bits-per-bytes measure? People care
about how much space they save. Take out the factor of 8 already.

  [ do you average before rounding or after? ]

Never do anything with rounded numbers unless you absolutely have to.

> I did some analysis and worked out that the "deep" average (the
> rounding of the average of the unrounded values) and the "shallow"
> average (the rounding of the average of the rounded values) can differ
> by up to one rounded digit unit value (Example: try 1.5 and 2.5
> rounding to 1dp).

1.5 and 2.5 both round to 2, so the average is the same either way.

> Choosing the deep average means that if anyone attacks my statistics
> with a calculator at a later date, they might conclude that I made a
> mistake.

Why do you think charts published in newspapers always have a disclaimer
like ``Percentages may not add up to 100% because of rounding''? Anyway,
if you do the sensible thing and include the original data (how many
bytes of output? how much time did it take to your machine's clock
precision?), nobody can possibly attack your results.

---Dan