[comp.archives] [compression] Re: Atronomical data compression

dwells@fits.cx.nrao.edu (Don Wells) (04/07/91)

Archive-name: compression/astro/bsplit-compress/1991-04-03
Archive-directory: fits.cx.nrao.edu:/FITS/HST/ [192.33.115.8]
Original-posting-by: dwells@fits.cx.nrao.edu (Don Wells)
Original-subject: Re: Atronomical data compression
Reposted-by: emv@msen.com (Edward Vielmetti, MSEN)

In article <4638@dftsrv.gsfc.nasa.gov> warnock@stars.gsfc.nasa.gov
(Archie Warnock) writes:

   In article <1991Mar27.021241.6339@magnus.acs.ohio-state.edu>,
   henden@hpuxa.acs.ohio-state.edu (Arne A. Henden) writes... 
   >  One technique that we wanted to try, but have never taken
   >the time to program, is to use bit plane compression.  For the

   I've looked into a variant on this idea - just by dividing the image 
   into the high- and low-order bytes and comparing the compression factor 
   this way with that for the entire (virgin) image.  Used a couple of 
   standard PC-type compression programs like PKZIP.  It helped, but not as 
   much as I'd have hoped.  Typically, the resulting compressed images were 
   about 10% - 15% smaller than if I just left the image alone.  You might 
   do better by breaking things up into individual bit-planes, but the last 
   few planes would be so noisy, you might not.  

Last November a German astronomer asked me to compress several HST
images. The result of my experiments is in the anonFTP server on
fits.cx.nrao.edu [192.33.115.8] in directory /FITS/HST. The first of
the six files which I processed is:

       5158080 Oct 19 14:42 w0bs0102t_cvt.c0h
       4380234 Nov 15 16:29 w0bs0102t_cvt.c0h.Z
       3088384 Nov 16 00:16 w0bs0102t_cvt.c0h.tar

The .Z is just "compress" [LZW], which got 15% in this case. The
".tar" contains:

           931 Nov 15 23:56 1990 README
           674 Nov 16 00:08 1990 Makefile
           923 Nov 15 22:53 1990 bmerge.c
           917 Nov 15 22:14 1990 bsplit.c
        495707 Nov 16 00:11 1990 w0bs0102t_cvt.c0h.0.Z
       2579040 Nov 16 00:11 1990 w0bs0102t_cvt.c0h.1

The original file has been split by bsplit.c into even and odd bytes.
The even bytes compressed by 80%, but the odd (noisy low order) bytes
were incompressible. Program bmerge.c can zipper the 3.1_MB of files
back together to re-create the the original 5.2_MB file. In this case
the technique removed 40% of the original bits (half of 80%). For a FP
file you could get 25% immediately by splitting into four streams so
that the favorable statistics of the exponent bytes could be
exploited. In a binary table (visibility data) it would pay to split
the rows into many separate byte streams to exploit the differing
statistics of the various columns and of the bytes inside those
columns. The multiple bytestream notion is a special case of the idea
of splitting the stream into bitstreams and compressing them
separately.


   Bottom line seems to be 
   that it's easy (and fairly fast) to get the first 50% or so.  Recoding 
   from 16-bit numbers to 8-bit differences gets you that much, and only 
   costs a single addition per pixel to restore.  The hard work starts if 
   you want more.

I agree with Archie's remarks about the virtues of finite differences.
My purpose in this posting is to point out that he may have been a bit
too pessimistic about the efficacy of simple even-odd bytestream
compression. 
--

Donald C. Wells             Associate Scientist        dwells@nrao.edu
National Radio Astronomy Observatory                   +1-804-296-0277
Edgemont Road                                     Fax= +1-804-296-0278
Charlottesville, Virginia 22903-2475 USA            78:31.1W, 38:02.2N