[alt.comp.compression] Compression of 16-bit sound files.

rog@speech.kth.se (Roger Lindell) (04/03/91)

Hello,

I would like to know if there exists any good and moderately fast compression
programs that will compress 16-bit soundfiles by a large amount. These files
are stored for archiving purposes and therefore I need a non lossy compression.
I have tried compress, freeze and yabba, and I get best compression with freeze
but it take the longest time both to compress and decompress. What I would like
is a program that has equal or better compression than freeze and has faster
(preferably much faster) decompression speed. Btw this is for a Unix machine.


--
Roger Lindell			rog@speech.kth.se
Phone: +46 8 790 75 73		Fax: +46 8 790 78 54
Dept. of Speech Communication and Music Acoustics
Royal Institute of Technology	Sweden

phillips@sparky.mit.edu (Mike Phillips) (04/11/91)

>I would like to know if there exists any good and moderately fast compression
>programs that will compress 16-bit soundfiles by a large amount. These files
>are stored for archiving purposes and therefore I need a non lossy compression.
>I have tried compress, freeze and yabba, and I get best compression with freeze
>but it take the longest time both to compress and decompress. What I would like
>is a program that has equal or better compression than freeze and has faster
>(preferably much faster) decompression speed. Btw this is for a Unix machine.


We found that we have been able to get fairly good compression (about 50-60% for
most  of our data) by simply packing the numbers into less bits.  So the compression 
program writes out a little header for chunks of speech saying "here comes X words 
packed into Y bits per word".  This works well because much of speech is pretty quiet.

It's simple, it's fast, it seems to work pretty well and it's free.

You can get this via anonymous ftp from lightning.lcs.mit.edu (18.27.0.147)

it's all in pub/compression/tarfile.Z

Mike (phillips@goldilocks.lcs.mit.edu)

agulbra@siri.unit.no (Arnt Gulbrandsen) (04/15/91)

  Most sound data doesn't include much in the top octave, so you can
compress the sound data to a bit over 50% by adding together two
items. The result needs one bit more, has half as may items and sounds
much the same. 16-bit 44kHz becomes 17-bit 22kHz (or 18-bit 11kHz, but
then the signal starts to deteriorate.)

Arnt Gulbrandsen,
University of Trondheim, Norway
agulbra@siri.unit.no

rogerc@thebox.rain.com (Roger Conley) (04/17/91)

From what I've read the standard compresion for sound files is ADPCM.
2X and 4X compresion is standard. I beleive the method uses relative 
amplitude changes instead of absolute changes since the differences
between two adjacent samples should usually be small.

I think CD's use this technique.

d88-jwa@byse.nada.kth.se (Jon W{tte) (04/21/91)

In article <> rogerc@thebox.rain.com (Roger Conley) writes:

   From what I've read the standard compresion for sound files is ADPCM.
   2X and 4X compresion is standard. I beleive the method uses relative 

   I think CD's use this technique.

I'm quite sure that CDs use a standard, "straight" storage method.
Itherwise you wold lose about the one advantage that CDs have over
LPs, sound-wise: good response to transients (spikes) and other
high-energy, high-frequency sound.

Furthermore, CDs use absolute encoding, which is a shame, since
relative coding (still 16 bit) would give a lot better dynamics -
at a little loss in transients, of course ...

No, CDs aren+t compressed. Rather, they're expanded for error
correction (added redundancy) If this is Hamming codes or some
sort of parity / checksum / CRCs, I do not know.

(Please. No comments about CD / LP, Okay ? At least put them in
rec.audio...)

--
          I remain: h+@nada.kth.se (Jon W{tte) (Yes, a brace !)
"It's not entirely useless. It came in this great cardboard box !" - Calvin
 "Life should be more like TV. I think all women should wear tight clothes,
       and all men should carry powerful handguns" - Calvin, again

madler@nntp-server.caltech.edu (Mark Adler) (04/21/91)

Jon W{tte (no, your modem is fine--it's a brace) writes:
>> I'm quite sure that CDs use a standard, "straight" storage method.
>> Itherwise you wold lose about the one advantage that CDs have over
>> LPs, sound-wise: good response to transients (spikes) and other
>> high-energy, high-frequency sound.

Yes, they store 16-bit samples (one per channel) with no compression.
However, they could have done a lossless compression, using differential
methods, and gotten about twice the time (well over two hours) on a CD.

>> No, CDs aren+t compressed. Rather, they're expanded for error
>> correction (added redundancy) If this is Hamming codes or some
>> sort of parity / checksum / CRCs, I do not know.

They use Reed-Solomon codes.  As I recall, it is a 3/4 rate (which means
one-fourth of the bits are for error-correction) 6-bit symbol code, with
some amount of interleaving.  These codes have excellent burst correction
capabilities, so macroscopic scratches in the CD result in no loss of
data.

Mark Adler
madler@pooh.caltech.edu

myhui@bnr.ca (Michael Hui) (04/21/91)

In article <1991Apr21.002203.4414@nntp-server.caltech.edu> madler@nntp-server.caltech.edu (Mark Adler) writes:
[...]
>Yes, they store 16-bit samples (one per channel) with no compression.
>However, they could have done a lossless compression, using differential
>methods, and gotten about twice the time (well over two hours) on a CD.

I wonder why no compression was used? Certainly the IC technology at that time
was advanced enough to have made it a cheap proposition. The FIR (I guess...)
filters used to interpolate between samples in most CD players must take up at
least as much silicon as a delta-modulation decoder.

      Michael MY Hui      Ottawa Canada      myhui@bnr.ca

abl@thevenin.ece.cmu.edu (Antonio Leal) (04/21/91)

myhui@bnr.ca (Michael Hui) writes:
>
>In article <1991Apr21.002203.4414@nntp-server.caltech.edu> madler@nntp-server.caltech.edu (Mark Adler) writes:
>[...]
>>Yes, they store 16-bit samples (one per channel) with no compression.
>>However, they could have done a lossless compression, using differential
>>methods, and gotten about twice the time (well over two hours) on a CD.
>
>I wonder why no compression was used? Certainly the IC technology at that time
>was advanced enough to have made it a cheap proposition. The FIR (I guess...)
>filters used to interpolate between samples in most CD players must take up at
>least as much silicon as a delta-modulation decoder.

Let's go easy here. These are distinct issues:

1 - Use of delta modulation - the converter is _much_ cheaper than the
  16-bit D/As, which are finicky as hell.  But, unless you jack up the
  data rate to equivalent levels, it's a lossy compression - transients
  may get it in the neck.  Never mind honest audio engineer objections,
  can you imagine what the baboons in the mystic audio business would
  have made of it ? We got enough of a circus with a bullet-proof
  PCM scheme as it was ...
  Incidentally, the "1-bit converter" business is delta mod.  Do some
  manipulation on the 16 bit samples, and feed a delta converter at a
  rate high enough to sound good.  Sell as a major improvement (well,
  it _is_ guaranteed to be monotonic, which 16-bit ADCs should, but
  may not, be).

2 - Lossless compression, e.g. Huffmann or LZW.  Even assuming that
  compression and error-correction didn't get on each other's hair,
  can you say "enough memory and computational power to decode
  a data stream at 2*16*44.1 kbit/s" ?

Probably this should move over to rec.audio ...

--
Antonio B. Leal			Dept. of Electrical and Computer Engineering
Bell: [412] 268-2937		Carnegie Mellon University
Net: abl@ece.cmu.edu		Pittsburgh, PA. 15213   U.S.A.

dce@smsc.sony.com (David Elliott) (04/21/91)

In article <1991Apr21.002203.4414@nntp-server.caltech.edu> madler@nntp-server.caltech.edu (Mark Adler) writes:
>They use Reed-Solomon codes.  As I recall, it is a 3/4 rate (which means
>one-fourth of the bits are for error-correction) 6-bit symbol code, with
>some amount of interleaving.  These codes have excellent burst correction
>capabilities, so macroscopic scratches in the CD result in no loss of
>data.

Correct.

According to Pohlmann, all of the interleaving, multiple
error-correction sets (before and after interleaving), and the
eight-to-fourteen modulation (EFM) used to make the difference between
the number of 1's and 0's small make the actual audio data on an audio
CD about 1/4 the possible data space on a CD.

The results are that a CD player that handles all of the levels of
error correction can correct a bad burst of 3874 bits (2.5mm) or
conceal a bad burst of 13,282 bits (7.7mm).

On the compression side, I suspect that when the audio CD was being
designed that processor technology wasn't advanced enough.  The
extra costs of adding a processor to do the decompression were
probably just too high.  The same types of problems befell MIDI.

dce@smsc.sony.com (David Elliott) (04/21/91)

In article <1991Apr21.020231.8109@bmerh408.bnr.ca> myhui@bnr.ca (Michael Hui) writes:
>I wonder why no compression was used? Certainly the IC technology at that time
>was advanced enough to have made it a cheap proposition. The FIR (I guess...)
>filters used to interpolate between samples in most CD players must take up at
>least as much silicon as a delta-modulation decoder.

Advanced enough, yes, but I doubt it was cheap and rugged enough.  As
it was, many people didn't buy CD players for years because of high
cost and questionable reliability.

Also, the goal wasn't to come up with a storage mechanism that would
hold tons of data, but to come up with a good replacement for a vinyl
record.  All they had to do was to create a medium that would hold 35
minutes worth of music, and they went much further than that.

As it stands, very few artists fill up CDs.  At best, they add a few
B-sides or EP songs to help justify the added cost.

I haven't studied the data headers, but I suspect that there's no
reason that a compressed audio CD standard couldn't be developed.
It's just a question of whether there's a need.

dce@smsc.sony.com (David Elliott) (04/21/91)

In article <1991Apr21.002203.4414@nntp-server.caltech.edu> madler@nntp-server.caltech.edu (Mark Adler) writes:
>However, they could have done a lossless compression, using differential
>methods, and gotten about twice the time (well over two hours) on a CD.

Are you sure about that?  Can you do lossless compression of sound in
general with differential methods?  How do you handle big zero-crossings
of fairly high-frequency square-ish waves efficiently?

tmb@ai.mit.edu (Thomas M. Breuel) (04/22/91)

   [why don't CD's use compression]

   2 - Lossless compression, e.g. Huffmann or LZW.  Even assuming that
     compression and error-correction didn't get on each other's hair,
     can you say "enough memory and computational power to decode
     a data stream at 2*16*44.1 kbit/s" ?

With fixed codes, decoding requires little more hardware than a ROM
and a register. To achieve some additional robustness to errors,
you probably want to include a synchronization code every few
hundred bits.

madler@nntp-server.caltech.edu (Mark Adler) (04/22/91)

In article <1991Apr21.163913.2249@smsc.sony.com> dce@smsc.sony.com (David Elliott) writes:
>In article <1991Apr21.002203.4414@nntp-server.caltech.edu> madler@nntp-server.caltech.edu (Mark Adler) writes:
>>However, they could have done a lossless compression, using differential
>>methods, and gotten about twice the time (well over two hours) on a CD.
>
>Are you sure about that?  Can you do lossless compression of sound in
>general with differential methods?  How do you handle big zero-crossings
>of fairly high-frequency square-ish waves efficiently?

I'm sure that you can get about 2:1 compression losslessly on the average.
It would vary with the source material, of course.  There is no way to
get 2:1 all the time, over short pieces of sound.  But some regions will
be less, some more.  CD players can already handle some buffering to
vary the rate of data of the disk to attain an exactly constant rate to
the D/A converters.  Come the think of it, this may explain why it wasn't
done--it would require more ram in the decoder.  As was pointed out, the
processing requirements for decompression would be minimal.  But the
buffering may not be.

Mark Adler
madler@pooh.caltech.edu

aipdc@castle.ed.ac.uk (Paul Crowley) (04/22/91)

In article <1991Apr21.020231.8109@bmerh408.bnr.ca> myhui@bnr.ca (Michael Hui) writes:
>I wonder why no compression was used? 

One possibility: some parts of the music could be more compressible than
others.  That means that you might get 50k of data one second and 20k
the next. But you can't change the speed the disc spins at quickly:
you have to get the same amount of data in one second as the next. 
Therefore no compression.
                                         ____
\/ o\ Paul Crowley aipdc@castle.ed.ac.uk \  /
/\__/ Part straight. Part gay. All queer. \/

tmb@ai.mit.edu (Thomas M. Breuel) (04/22/91)

   One possibility: some parts of the music could be more compressible than
   others.  That means that you might get 50k of data one second and 20k
   the next. But you can't change the speed the disc spins at quickly:
   you have to get the same amount of data in one second as the next. 
   Therefore no compression.

Data compression only speeds up transfers (for regions that compress
poorly, you simply turn it off), so data compression never causes the
disk reads to fall behind. On the other hand, and you can start/stop
reading from a CD easily when data is coming in too fast.  Maybe data
compression would increase the amount of buffering needed somewhat,
but it is unlikely that the effect would be large enough to make
the use of data compression impractical even on low-end systems.

jk87377@cc.tut.fi (Juhana Kouhia) (04/24/91)

Have anyone researched possibility to use a fractal compression
(IFS) to music.
I can see that music contains a lot more selfsimilarities than
pictures; for example listen to disco hit music.
Same thing with some symphonies.

It might be possible compress a file by checking those 'same'
waveforms and save one of them and store others as relatives to the
saved waveform -- because between the saved and others have a minor
differences it needs only few bits to store those relative waveforms.
(Huh, got that? My English is not good; well, you might noted that
allready. :-)

Maybe 1:2 compression ratio... or worst.

Juhana Kouhia