sns@lo-fan.caltech.edu (Sam Southard Jr.) (03/23/91)
I have a topic/question that should be suitable for this newsgroup. I am a software engineer working on the software for the Keck Telescope (the new 10m in Hawaii). One of the things I am doing is taking the data from the CCD to a VMEbus controller crate (Sun 1E running VxWorks) and from there to a Sun 4/470 over the ethernet. Each image can be up to 4096x4096 pixels (a 2x2 mosaic of 2048x2048 CCDs), each pixel being 16 bits. Obviously, if this kind of data is going over the Ethernet, we want to compress it as much as possible. The compression I can do is restricted in a few ways, including: 1) It can not take much CPU, since some real-time control of the instrument is being done by the same VxWorks box. Much is a term that is yet to be defined. 2) Simple schemes like counting the number of times a pixel occurs in a row and comressing that to a single occurance and a count are out, because there is a random element to the data (cosmic rays, etc.), which means that it is unlikely that there will be many cases where this method would compress the data. I have thought of a few methods which I can try, but I am interested in looking at as many ways as possible. I have lots of actual data to try out various compression methods, and a reasonable amount of CPU to burn in choosing the algorithm (or two or three) which will actualyl be used. Does anyone have any suggestiongs? -- Sam Southard, Jr. {sns@deimos.caltech.edu|{backbone}!cit-vax!deimos!sns}
philip@beeblebrox.dle.dg.com (Philip Gladstone) (03/25/91)
>>>>> On 23 Mar 91 01:35:57 GMT, sns@lo-fan.caltech.edu (Sam Southard Jr.) said:
Sam> I have a topic/question that should be suitable for this newsgroup. I am a
Sam> software engineer working on the software for the Keck Telescope (the new 10m
Sam> in Hawaii). One of the things I am doing is taking the data from the CCD to
Sam> a VMEbus controller crate (Sun 1E running VxWorks) and from there to a
Sam> Sun 4/470 over the ethernet.
Sam> I have thought of a few methods which I can try, but I am interested in looking
Sam> at as many ways as possible. I have lots of actual data to try out various
Sam> compression methods, and a reasonable amount of CPU to burn in choosing the
Sam> algorithm (or two or three) which will actualyl be used.
Sam> Does anyone have any suggestiongs?
The first question that comes to mind is whether you want a lossless
scheme or a lossy scheme?
If you want lossless compression, then I must ask you why? You already
admit that there is noise in the input signal and I bet you get noise
introduced from atmospheric turbulence.
If you want lossy compression, then you have to decide how much loss
you are prepared to accept, In general the lossy alogorithms do much
better at compressing pictures than the lossless ones. The improvement
can be a factor of 10 better.
The whole issue of whether lossy as opposed to lossless compression is
'better' is being discussed ad nauseam in various other groups, so I
don't want to start it here. But the question as to which is
appropriate for particular applications is relevant.
Philip
--
Philip Gladstone Dev Lab Europe, Data General, Cambridge, UK
"I really wish I'd listened to what my mother told me when I was young."
"Why, what did she tell you?"
"I don't know, I didn't listen."
sns@lo-fan.caltech.edu (Sam Southard Jr.) (03/26/91)
In article <PHILIP.91Mar24144205@beeblebrox.dle.dg.com>, philip@beeblebrox.dle.dg.com (Philip Gladstone) writes: |> >>>>> On 23 Mar 91 01:35:57 GMT, sns@lo-fan.caltech.edu (Sam Southard Jr.) said: |> |> The first question that comes to mind is whether you want a lossless |> scheme or a lossy scheme? |> |> If you want lossless compression, then I must ask you why? You already |> admit that there is noise in the input signal and I bet you get noise |> introduced from atmospheric turbulence. This question has been asked by many people who have sent me responses to my original message (BTW: Thanks!). I apologize for not answering it in my inital posting - since I was so involved with the thing the answer was so obvious to me thast I didn't even bother addressing the issue. The answer is, I need lossless compression. This code is going in the actual instrument. Astronomers will not be happy if they were informed that they will get anything other than the actual data, bit for bit. The decision which was made (I agree with it) was that the astronomer should decide what to do with the data. One of the reasons for this is that these instruments are often used to look at very faint light sources and features which are quite often fairly close to the noise level. It would be extremely difficult for anyone but the individual astronomer to decide if a particular small delta somewhere was significant. Imagine what an astronomer would do to me if he found out that the spectral line he based his thesis on in fact might have been merely noise introduced by a compression algorithm! |> If you want lossy compression, then you have to decide how much loss |> you are prepared to accept, In general the lossy alogorithms do much |> better at compressing pictures than the lossless ones. The improvement |> can be a factor of 10 better. Unfortunately, they are completely unacceptable in this case.
xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (03/26/91)
sns@deimos.caltech.edu (Sam Southard Jr.) writes: > I have a topic/question that should be suitable for this newsgroup. I > am a software engineer working on the software for the Keck Telescope > (the new 10m in Hawaii). One of the things I am doing is taking the > data from the CCD to a VMEbus controller crate (Sun 1E running > VxWorks) and from there to a Sun 4/470 over the ethernet. > Each image can be up to 4096x4096 pixels (a 2x2 mosaic of 2048x2048 > CCDs), each pixel being 16 bits. Obviously, if this kind of data is > going over the Ethernet, we want to compress it as much as possible. > The compression I can do is restricted in a few ways, including: > 1) It can not take much CPU, since some real-time control of the > instrument is being done by the same VxWorks box. Much is a term that > is yet to be defined. > 2) Simple schemes like counting the number of times a pixel occurs in > a row and comressing that to a single occurance and a count are out, > because there is a random element to the data (cosmic rays, etc.), > which means that it is unlikely that there will be many cases where > this method would compress the data. > I have thought of a few methods which I can try, but I am interested > in looking at as many ways as possible. I have lots of actual data to > try out various compression methods, and a reasonable amount of CPU to > burn in choosing the algorithm (or two or three) which will actualyl > be used. > Does anyone have any suggestiongs? Sure, else what's the point of responding. ;-) Data which is primarily of one (background) value, but contains enough speckley noise to defeat run length or repeated string style (LZW, etc) encodings, such as your CCD data, or other scanned real world images of sparse data (engineering line drawings, e.g.) are ideal candidates for arithmetic encoding. Arithmetic encoding gains its efficiency by being able to effectively encode a symbol set which is severely skewed to a few of the possible values, (preferably one, where it is capable of encoding the prevelant symbol in a small fraction of one bit), but may in a large dataset contain at least one of _each_ symbol. It is quite immune to speckley noise, because it encodes symbol (here half-pixel) by symbol rather than run by run, so a speckle just bumps the count for a nearly empty bin by one for a while having little other effect on the compression, rather than spoiling a run every time a speckle appears. The down side is that (at least the implementation I know) is quite CPU intensive. I understand there are much more effective methods, but I've been quite impressed with the effectiveness of the simple routine in the June, 1987 CACM on data of this type. It might suit your needs for speed, or be effective enough in compression you would consider dedicating a box just to the compression task just to make use of this slower technique. I have the code from the CACM article, with the modification of ANSI C prototypes, which I've compiled under a C compiler with prototypes, and under BSD Unix, but not both at once. I'd be glad to pass the code as is to a few requests, but even happier to pass it on to someone who would make sure it compiles under a modern Unix C compiler with the given makefile, and then post it here or put it up for FTP access or both. It has the typical BSD long file names, so it would be a little extra work to unpack and use on a system with 14 character file name limits; a bit of work by the recipient on the shar file with an editor, and then on the #include statements and makefile when the stuff is unpacked, will take care of that. I also have a clever modification which replaces the "slide the highest frequency bin to to the front of a linearly searched list for efficient access" frequency count table of the CACM implementation with a heap based table, which works but is not quite in such clean shape, if anyone is interested. This is faster for a less skewed set of symbol frequencies such as you might get by using it to replace the Huffman coding part of lharc, or compressing common text, where the order (log (symbol set size)) access to a heap bin beats the order(symbol set size) access of the linearly searched table to find a bin, but loses in the current case, where the expected table search would be very close to order(1) in the highly skewed frequencies case. Both sets of routines still have the general aspect of the CACM code, which was written for tutorial rather than speed purposes (it does a subroutine call per input or output bit, e.g.), so there is lots of room to speed either routine up to make it less grim than my current experience. Again, I understand arithmetic coding compression has moved on long past this code, but this is still a good place to start, because it has a great set of documentation in the CACM article to assist understanding. I haven't touched this stuff since June, from mental health problems, so I'd be just as happy to pass it on to someone interested in pursuing the matter further. Kent, the man from xanth. <xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>
henden@hpuxa.acs.ohio-state.edu (Arne A. Henden) (03/27/91)
In article <1991Mar23.013557.28151@nntp-server.caltech.edu> sns@deimos.caltech.edu (Sam Southard Jr.) writes: >I have a topic/question that should be suitable for this newsgroup. [ ..much deleted!] >[compressing astronomical images] > Obviously, if this kind of data is going over the >Ethernet, we want to compress it as much as possible. > >Does anyone have any suggestiongs? > >Sam Southard, Jr. >{sns@deimos.caltech.edu|{backbone}!cit-vax!deimos!sns} Most astronomical images rattle around some low value with a few spikes reaching near saturation (stars). Others have proposed some interesting techniques, such as using a difference model. One technique that we wanted to try, but have never taken the time to program, is to use bit plane compression. For the example above, divide the image into 16 1-bit images and then run-length compress each image. You will find a very high compression ratio on the high order bits and essentially zero compression on the low order bits. I'd bet the average would be about 40 percent compression, but the task would be quite CPU intensive. We were originally going to use this technique for our remote observing quick-look images, but find that run-length compression with an adjustable threshold works well enough for now, and intend to buy a pair of boards using the c-cube chip in the near future to try a different approach. Arne Henden Astronomy Dept.
warnock@stars.gsfc.nasa.gov (Archie Warnock) (03/28/91)
In article <1991Mar27.021241.6339@magnus.acs.ohio-state.edu>, henden@hpuxa.acs.ohio-state.edu (Arne A. Henden) writes... > One technique that we wanted to try, but have never taken >the time to program, is to use bit plane compression. For the Hi Arne :-) Long time, eh? I've looked into a variant on this idea - just by dividing the image into the high- and low-order bytes and comparing the compression factor this way with that for the entire (virgin) image. Used a couple of standard PC-type compression programs like PKZIP. It helped, but not as much as I'd have hoped. Typically, the resulting compressed images were about 10% - 15% smaller than if I just left the image alone. You might do better by breaking things up into individual bit-planes, but the last few planes would be so noisy, you might not. Bottom line seems to be that it's easy (and fairly fast) to get the first 50% or so. Recoding from 16-bit numbers to 8-bit differences gets you that much, and only costs a single addition per pixel to restore. The hard work starts if you want more. ---------------------------------------------------------------------------- -- Archie Warnock Internet: warnock@stars.gsfc.nasa.gov -- ST Systems Corp. SPAN: STARS::WARNOCK -- NASA/GSFC "Unix - JCL for the 90s"
dwells@fits.cx.nrao.edu (Don Wells) (04/03/91)
In article <4638@dftsrv.gsfc.nasa.gov> warnock@stars.gsfc.nasa.gov (Archie Warnock) writes: In article <1991Mar27.021241.6339@magnus.acs.ohio-state.edu>, henden@hpuxa.acs.ohio-state.edu (Arne A. Henden) writes... > One technique that we wanted to try, but have never taken >the time to program, is to use bit plane compression. For the I've looked into a variant on this idea - just by dividing the image into the high- and low-order bytes and comparing the compression factor this way with that for the entire (virgin) image. Used a couple of standard PC-type compression programs like PKZIP. It helped, but not as much as I'd have hoped. Typically, the resulting compressed images were about 10% - 15% smaller than if I just left the image alone. You might do better by breaking things up into individual bit-planes, but the last few planes would be so noisy, you might not. Last November a German astronomer asked me to compress several HST images. The result of my experiments is in the anonFTP server on fits.cx.nrao.edu [192.33.115.8] in directory /FITS/HST. The first of the six files which I processed is: 5158080 Oct 19 14:42 w0bs0102t_cvt.c0h 4380234 Nov 15 16:29 w0bs0102t_cvt.c0h.Z 3088384 Nov 16 00:16 w0bs0102t_cvt.c0h.tar The .Z is just "compress" [LZW], which got 15% in this case. The ".tar" contains: 931 Nov 15 23:56 1990 README 674 Nov 16 00:08 1990 Makefile 923 Nov 15 22:53 1990 bmerge.c 917 Nov 15 22:14 1990 bsplit.c 495707 Nov 16 00:11 1990 w0bs0102t_cvt.c0h.0.Z 2579040 Nov 16 00:11 1990 w0bs0102t_cvt.c0h.1 The original file has been split by bsplit.c into even and odd bytes. The even bytes compressed by 80%, but the odd (noisy low order) bytes were incompressible. Program bmerge.c can zipper the 3.1_MB of files back together to re-create the the original 5.2_MB file. In this case the technique removed 40% of the original bits (half of 80%). For a FP file you could get 25% immediately by splitting into four streams so that the favorable statistics of the exponent bytes could be exploited. In a binary table (visibility data) it would pay to split the rows into many separate byte streams to exploit the differing statistics of the various columns and of the bytes inside those columns. The multiple bytestream notion is a special case of the idea of splitting the stream into bitstreams and compressing them separately. Bottom line seems to be that it's easy (and fairly fast) to get the first 50% or so. Recoding from 16-bit numbers to 8-bit differences gets you that much, and only costs a single addition per pixel to restore. The hard work starts if you want more. I agree with Archie's remarks about the virtues of finite differences. My purpose in this posting is to point out that he may have been a bit too pessimistic about the efficacy of simple even-odd bytestream compression. -- Donald C. Wells Associate Scientist dwells@nrao.edu National Radio Astronomy Observatory +1-804-296-0277 Edgemont Road Fax= +1-804-296-0278 Charlottesville, Virginia 22903-2475 USA 78:31.1W, 38:02.2N