[comp.compression] Compression of 16-bit sound files.

rog@speech.kth.se (Roger Lindell) (04/03/91)

Hello,

I would like to know if there exists any good and moderately fast compression
programs that will compress 16-bit soundfiles by a large amount. These files
are stored for archiving purposes and therefore I need a non lossy compression.
I have tried compress, freeze and yabba, and I get best compression with freeze
but it take the longest time both to compress and decompress. What I would like
is a program that has equal or better compression than freeze and has faster
(preferably much faster) decompression speed. Btw this is for a Unix machine.

--
Roger Lindell			rog@speech.kth.se
Phone: +46 8 790 75 73		Fax: +46 8 790 78 54
Dept. of Speech Communication and Music Acoustics
Royal Institute of Technology	Sweden

rog@speech.kth.se (Roger Lindell) (04/03/91)

Hello,

I would like to know if there exists any good and moderately fast compression
programs that will compress 16-bit soundfiles by a large amount. These files
are stored for archiving purposes and therefore I need a non lossy compression.
I have tried compress, freeze and yabba, and I get best compression with freeze
but it take the longest time both to compress and decompress. What I would like
is a program that has equal or better compression than freeze and has faster
(preferably much faster) decompression speed. Btw this is for a Unix machine.


--
Roger Lindell			rog@speech.kth.se
Phone: +46 8 790 75 73		Fax: +46 8 790 78 54
Dept. of Speech Communication and Music Acoustics
Royal Institute of Technology	Sweden

lance@motcsd.csd.mot.com (lance.norskog) (04/04/91)

rog@speech.kth.se (Roger Lindell) writes:

>Hello,

>I would like to know if there exists any good and moderately fast compression
>programs that will compress 16-bit soundfiles by a large amount. These files
>are stored for archiving purposes and therefore I need a non lossy compression.
> ... [ standard compresses are slow and don't do very well ]

The standard naive sound compressor is just saving the deltas between samples.
You save deltas as a stream of 2-, 3-, 4-, ..., N-bit records.

What I have noticed playing around with voice files is that they have lots
of "flip" deltas with zero-crossings.  That is: "100,-101,104,-99".
It might work well to save the deltas AND note that some
deltas have zero-crossings, thus a stream like "100,+4,+6,-7,C-9" comes
out "100,104,110,103,-112".

Lance Norskog

phillips@sparky.mit.edu (Mike Phillips) (04/11/91)

>I would like to know if there exists any good and moderately fast compression
>programs that will compress 16-bit soundfiles by a large amount. These files
>are stored for archiving purposes and therefore I need a non lossy compression.
>I have tried compress, freeze and yabba, and I get best compression with freeze
>but it take the longest time both to compress and decompress. What I would like
>is a program that has equal or better compression than freeze and has faster
>(preferably much faster) decompression speed. Btw this is for a Unix machine.


We found that we have been able to get fairly good compression (about 50-60% for
most  of our data) by simply packing the numbers into less bits.  So the compression 
program writes out a little header for chunks of speech saying "here comes X words 
packed into Y bits per word".  This works well because much of speech is pretty quiet.

It's simple, it's fast, it seems to work pretty well and it's free.

You can get this via anonymous ftp from lightning.lcs.mit.edu (18.27.0.147)

it's all in pub/compression/tarfile.Z

Mike (phillips@goldilocks.lcs.mit.edu)

agulbra@siri.unit.no (Arnt Gulbrandsen) (04/15/91)

  Most sound data doesn't include much in the top octave, so you can
compress the sound data to a bit over 50% by adding together two
items. The result needs one bit more, has half as may items and sounds
much the same. 16-bit 44kHz becomes 17-bit 22kHz (or 18-bit 11kHz, but
then the signal starts to deteriorate.)

Arnt Gulbrandsen,
University of Trondheim, Norway
agulbra@siri.unit.no

daemon@felix.UUCP (The devil himself) (04/15/91)

In article <rog.670690472@color> rog@speech.kth.se (Roger Lindell) writes:
>Hello,
>
>I would like to know if there exists any good and moderately fast compression
>programs that will compress 16-bit soundfiles by a large amount. 
From: peterg@felix.UUCP (Peter Gruenbeck)

I did some electronics tinkering about 10 years ago with an OKI 
semiconductor chip set for digitial sound capture and synthesis. It
was essentially an A/D converter and digital signal processor which 
compressed the sounds on the fly using a method called ADPCM.

ADPCM which stands for Adaptive Differential Pulse Code Modulation basically
coded each sound sample as a 3 or 4 bit difference (+-4 or +-8 amplitude 
levels) from the previous sample.  This is based on the assumption that the 
sounds being recorded (speech & music) are more or less continuous. This 
technology is now widely available in digital telephone answering machines. 
Using this method, 15 seconds of speech at an 8K sample rate can be 
compressed into about 64KB of memory.

-- Pete Gruenbeck
--

The poor grammar, spelling errors, and errors        o  o 
in usage are included with a purpose: I write          ^ 
something for everybody.                             (---)

campbell@wookumz.gnu.ai.mit.edu (Paul Campbell) (04/16/91)

As far as sound compression goes, why not do a Fast Fourier transform or
a Discrete Cosine Transform to get it to a frequency spectrum and then
compress that after quantizing it? It should end up much smaller after
compression because the frequency conversion on anything but noise spectra
should introduce lots of zeroes. It's not that hard to do the reverse
transform for either case, either. You do introduce a significant loss in
speed, but a lot of sound editing is easier on the frequency form and the
original goal was compression of the data, after all.

rogerc@thebox.rain.com (Roger Conley) (04/17/91)

From what I've read the standard compresion for sound files is ADPCM.
2X and 4X compresion is standard. I beleive the method uses relative 
amplitude changes instead of absolute changes since the differences
between two adjacent samples should usually be small.

I think CD's use this technique.

jj@alice.att.com (jj, like it or not) (04/18/91)

Those of you who care about the subject should read
Johnston's paper in JSAC 1988 about audio coding,
Stoll and Dehery's ICC 1990 paper, and Brandenburg's
1989 paper in ICASSP for starters.

You're oversimplifying a whole body of recent work.
-- 
       -------->From the pyrolagnic keyboard of jj@alice.att.com<--------
Copyright alice!jj 1991,  all rights reserved,  except transmission  by USENET and
like free facilities granted.  Said permission is granted only for complete copies
that include this notice.    Use on pay-for-read services specifically disallowed.

aipdc@castle.ed.ac.uk (Paul Crowley) (04/18/91)

In article <14938@life.ai.mit.edu> campbell@wookumz.gnu.ai.mit.edu (Paul Campbell) writes:
>As far as sound compression goes, why not do a Fast Fourier transform or
>a Discrete Cosine Transform to get it to a frequency spectrum and then
>compress that after quantizing it? 

You could probably fit a curve to the noise values and send that!
(meaningful noise-like sound does crop up a lot)
                                         ____
\/ o\ Paul Crowley aipdc@uk.ac.ed.castle \  /
/\__/ Part straight. Part gay. All queer. \/

d88-jwa@byse.nada.kth.se (Jon W{tte) (04/21/91)

In article <> rogerc@thebox.rain.com (Roger Conley) writes:

   From what I've read the standard compresion for sound files is ADPCM.
   2X and 4X compresion is standard. I beleive the method uses relative 

   I think CD's use this technique.

I'm quite sure that CDs use a standard, "straight" storage method.
Itherwise you wold lose about the one advantage that CDs have over
LPs, sound-wise: good response to transients (spikes) and other
high-energy, high-frequency sound.

Furthermore, CDs use absolute encoding, which is a shame, since
relative coding (still 16 bit) would give a lot better dynamics -
at a little loss in transients, of course ...

No, CDs aren+t compressed. Rather, they're expanded for error
correction (added redundancy) If this is Hamming codes or some
sort of parity / checksum / CRCs, I do not know.

(Please. No comments about CD / LP, Okay ? At least put them in
rec.audio...)

--
          I remain: h+@nada.kth.se (Jon W{tte) (Yes, a brace !)
"It's not entirely useless. It came in this great cardboard box !" - Calvin
 "Life should be more like TV. I think all women should wear tight clothes,
       and all men should carry powerful handguns" - Calvin, again

madler@nntp-server.caltech.edu (Mark Adler) (04/21/91)

Jon W{tte (no, your modem is fine--it's a brace) writes:
>> I'm quite sure that CDs use a standard, "straight" storage method.
>> Itherwise you wold lose about the one advantage that CDs have over
>> LPs, sound-wise: good response to transients (spikes) and other
>> high-energy, high-frequency sound.

Yes, they store 16-bit samples (one per channel) with no compression.
However, they could have done a lossless compression, using differential
methods, and gotten about twice the time (well over two hours) on a CD.

>> No, CDs aren+t compressed. Rather, they're expanded for error
>> correction (added redundancy) If this is Hamming codes or some
>> sort of parity / checksum / CRCs, I do not know.

They use Reed-Solomon codes.  As I recall, it is a 3/4 rate (which means
one-fourth of the bits are for error-correction) 6-bit symbol code, with
some amount of interleaving.  These codes have excellent burst correction
capabilities, so macroscopic scratches in the CD result in no loss of
data.

Mark Adler
madler@pooh.caltech.edu

myhui@bnr.ca (Michael Hui) (04/21/91)

In article <1991Apr21.002203.4414@nntp-server.caltech.edu> madler@nntp-server.caltech.edu (Mark Adler) writes:
[...]
>Yes, they store 16-bit samples (one per channel) with no compression.
>However, they could have done a lossless compression, using differential
>methods, and gotten about twice the time (well over two hours) on a CD.

I wonder why no compression was used? Certainly the IC technology at that time
was advanced enough to have made it a cheap proposition. The FIR (I guess...)
filters used to interpolate between samples in most CD players must take up at
least as much silicon as a delta-modulation decoder.

      Michael MY Hui      Ottawa Canada      myhui@bnr.ca

abl@thevenin.ece.cmu.edu (Antonio Leal) (04/21/91)

myhui@bnr.ca (Michael Hui) writes:
>
>In article <1991Apr21.002203.4414@nntp-server.caltech.edu> madler@nntp-server.caltech.edu (Mark Adler) writes:
>[...]
>>Yes, they store 16-bit samples (one per channel) with no compression.
>>However, they could have done a lossless compression, using differential
>>methods, and gotten about twice the time (well over two hours) on a CD.
>
>I wonder why no compression was used? Certainly the IC technology at that time
>was advanced enough to have made it a cheap proposition. The FIR (I guess...)
>filters used to interpolate between samples in most CD players must take up at
>least as much silicon as a delta-modulation decoder.

Let's go easy here. These are distinct issues:

1 - Use of delta modulation - the converter is _much_ cheaper than the
  16-bit D/As, which are finicky as hell.  But, unless you jack up the
  data rate to equivalent levels, it's a lossy compression - transients
  may get it in the neck.  Never mind honest audio engineer objections,
  can you imagine what the baboons in the mystic audio business would
  have made of it ? We got enough of a circus with a bullet-proof
  PCM scheme as it was ...
  Incidentally, the "1-bit converter" business is delta mod.  Do some
  manipulation on the 16 bit samples, and feed a delta converter at a
  rate high enough to sound good.  Sell as a major improvement (well,
  it _is_ guaranteed to be monotonic, which 16-bit ADCs should, but
  may not, be).

2 - Lossless compression, e.g. Huffmann or LZW.  Even assuming that
  compression and error-correction didn't get on each other's hair,
  can you say "enough memory and computational power to decode
  a data stream at 2*16*44.1 kbit/s" ?

Probably this should move over to rec.audio ...

--
Antonio B. Leal			Dept. of Electrical and Computer Engineering
Bell: [412] 268-2937		Carnegie Mellon University
Net: abl@ece.cmu.edu		Pittsburgh, PA. 15213   U.S.A.

dce@smsc.sony.com (David Elliott) (04/21/91)

In article <1991Apr21.002203.4414@nntp-server.caltech.edu> madler@nntp-server.caltech.edu (Mark Adler) writes:
>They use Reed-Solomon codes.  As I recall, it is a 3/4 rate (which means
>one-fourth of the bits are for error-correction) 6-bit symbol code, with
>some amount of interleaving.  These codes have excellent burst correction
>capabilities, so macroscopic scratches in the CD result in no loss of
>data.

Correct.

According to Pohlmann, all of the interleaving, multiple
error-correction sets (before and after interleaving), and the
eight-to-fourteen modulation (EFM) used to make the difference between
the number of 1's and 0's small make the actual audio data on an audio
CD about 1/4 the possible data space on a CD.

The results are that a CD player that handles all of the levels of
error correction can correct a bad burst of 3874 bits (2.5mm) or
conceal a bad burst of 13,282 bits (7.7mm).

On the compression side, I suspect that when the audio CD was being
designed that processor technology wasn't advanced enough.  The
extra costs of adding a processor to do the decompression were
probably just too high.  The same types of problems befell MIDI.

dce@smsc.sony.com (David Elliott) (04/21/91)

In article <1991Apr21.020231.8109@bmerh408.bnr.ca> myhui@bnr.ca (Michael Hui) writes:
>I wonder why no compression was used? Certainly the IC technology at that time
>was advanced enough to have made it a cheap proposition. The FIR (I guess...)
>filters used to interpolate between samples in most CD players must take up at
>least as much silicon as a delta-modulation decoder.

Advanced enough, yes, but I doubt it was cheap and rugged enough.  As
it was, many people didn't buy CD players for years because of high
cost and questionable reliability.

Also, the goal wasn't to come up with a storage mechanism that would
hold tons of data, but to come up with a good replacement for a vinyl
record.  All they had to do was to create a medium that would hold 35
minutes worth of music, and they went much further than that.

As it stands, very few artists fill up CDs.  At best, they add a few
B-sides or EP songs to help justify the added cost.

I haven't studied the data headers, but I suspect that there's no
reason that a compressed audio CD standard couldn't be developed.
It's just a question of whether there's a need.

dce@smsc.sony.com (David Elliott) (04/21/91)

In article <1991Apr21.002203.4414@nntp-server.caltech.edu> madler@nntp-server.caltech.edu (Mark Adler) writes:
>However, they could have done a lossless compression, using differential
>methods, and gotten about twice the time (well over two hours) on a CD.

Are you sure about that?  Can you do lossless compression of sound in
general with differential methods?  How do you handle big zero-crossings
of fairly high-frequency square-ish waves efficiently?

tmb@ai.mit.edu (Thomas M. Breuel) (04/22/91)

   [why don't CD's use compression]

   2 - Lossless compression, e.g. Huffmann or LZW.  Even assuming that
     compression and error-correction didn't get on each other's hair,
     can you say "enough memory and computational power to decode
     a data stream at 2*16*44.1 kbit/s" ?

With fixed codes, decoding requires little more hardware than a ROM
and a register. To achieve some additional robustness to errors,
you probably want to include a synchronization code every few
hundred bits.

madler@nntp-server.caltech.edu (Mark Adler) (04/22/91)

In article <1991Apr21.163913.2249@smsc.sony.com> dce@smsc.sony.com (David Elliott) writes:
>In article <1991Apr21.002203.4414@nntp-server.caltech.edu> madler@nntp-server.caltech.edu (Mark Adler) writes:
>>However, they could have done a lossless compression, using differential
>>methods, and gotten about twice the time (well over two hours) on a CD.
>
>Are you sure about that?  Can you do lossless compression of sound in
>general with differential methods?  How do you handle big zero-crossings
>of fairly high-frequency square-ish waves efficiently?

I'm sure that you can get about 2:1 compression losslessly on the average.
It would vary with the source material, of course.  There is no way to
get 2:1 all the time, over short pieces of sound.  But some regions will
be less, some more.  CD players can already handle some buffering to
vary the rate of data of the disk to attain an exactly constant rate to
the D/A converters.  Come the think of it, this may explain why it wasn't
done--it would require more ram in the decoder.  As was pointed out, the
processing requirements for decompression would be minimal.  But the
buffering may not be.

Mark Adler
madler@pooh.caltech.edu

aipdc@castle.ed.ac.uk (Paul Crowley) (04/22/91)

In article <1991Apr21.020231.8109@bmerh408.bnr.ca> myhui@bnr.ca (Michael Hui) writes:
>I wonder why no compression was used? 

One possibility: some parts of the music could be more compressible than
others.  That means that you might get 50k of data one second and 20k
the next. But you can't change the speed the disc spins at quickly:
you have to get the same amount of data in one second as the next. 
Therefore no compression.
                                         ____
\/ o\ Paul Crowley aipdc@castle.ed.ac.uk \  /
/\__/ Part straight. Part gay. All queer. \/

tmb@ai.mit.edu (Thomas M. Breuel) (04/22/91)

   One possibility: some parts of the music could be more compressible than
   others.  That means that you might get 50k of data one second and 20k
   the next. But you can't change the speed the disc spins at quickly:
   you have to get the same amount of data in one second as the next. 
   Therefore no compression.

Data compression only speeds up transfers (for regions that compress
poorly, you simply turn it off), so data compression never causes the
disk reads to fall behind. On the other hand, and you can start/stop
reading from a CD easily when data is coming in too fast.  Maybe data
compression would increase the amount of buffering needed somewhat,
but it is unlikely that the effect would be large enough to make
the use of data compression impractical even on low-end systems.

nbvs@cl.cam.ac.uk (Nicko van Someren) (04/22/91)

As I see it there are three reasons why the CD standard people (Philips?Sony)
did not put in compression:

1) They would need extra hardware to decode it and the cost was high enough
   already.

2) The data rate would end up uneven because some bits would compress better
   than others.  Do you remember how much RAM cost in 1983?

3) Using compression would make it harder to gloss over the errors that the
   correction hardware could not fix.  If you lose raw data it is probably
   easier to guess what it was.

The whole idea of 'lossless' or 'lossy' compression of signals that are to be
put into your ear seems a bit silly to me anyway.  The fact is it has alot to
do with the person in question.  If you look at the data that gets stored
most of it is only about 13bit as most music spends its time at least 10dB
down from the peal level.  16bit linear is too high a resolution at high
amplitudes and too low a resolution at low ones.  A 12 or 13 bit log system
would give better quality for the dynamic ranges that music has and take up
only 3/4 of the space (or give you 1/3 more time).  While log ADCs cost more
you only have to make a very few of them and log DACs are pretty simple.

So to go back to the original topic of the thread, if you want to compress
your sound files, try storing 10bit log values.  It gives you 1.6:1 compression
and I doubt you will notice the 'loss'.

Nicko

+-----------------------------------------------------------------------------+
| Nicko van Someren, nbvs@cl.cam.ac.uk, (44) 223 358707 or (44) 860 498903    |
+-----------------------------------------------------------------------------+

ge@dbf.kun.nl (Ge' Weijers) (04/23/91)

abl@thevenin.ece.cmu.edu (Antonio Leal) writes:
>  Incidentally, the "1-bit converter" business is delta mod.  Do some
>  manipulation on the 16 bit samples, and feed a delta converter at a
>  rate high enough to sound good.  Sell as a major improvement (well,
>  it _is_ guaranteed to be monotonic, which 16-bit ADCs should, but
>  may not, be).

The "Do some manipulation" makes the result less than perfectly monotonic.
A straightforward implementation of deltamodulation would need a terribly high
sampling frequency (2^16 * 44 kHz) so the digital data is preprocessed, and
this process adds noise. Care is taken to put most of the noise above
the 22kHz by a process called 'noise shaping'. I don't know how that works,
as the publications I've seen are less than clear.

Ge'

--
Ge' Weijers                                    Internet/UUCP: ge@cs.kun.nl
Faculty of Mathematics and Computer Science,   (uunet.uu.net!cs.kun.nl!ge)
University of Nijmegen, Toernooiveld 1         
6525 ED Nijmegen, the Netherlands              tel. +3180652483 (UTC-2)

jk87377@cc.tut.fi (Juhana Kouhia) (04/24/91)

Have anyone researched possibility to use a fractal compression
(IFS) to music.
I can see that music contains a lot more selfsimilarities than
pictures; for example listen to disco hit music.
Same thing with some symphonies.

It might be possible compress a file by checking those 'same'
waveforms and save one of them and store others as relatives to the
saved waveform -- because between the saved and others have a minor
differences it needs only few bits to store those relative waveforms.
(Huh, got that? My English is not good; well, you might noted that
allready. :-)

Maybe 1:2 compression ratio... or worst.

Juhana Kouhia

mskuhn@faui09.informatik.uni-erlangen.de (Markus Kuhn) (04/25/91)

Scientists from the University of Erlangen have (as far as I know)
developed a lossy sound compression method for voice, music, etc.
You can select freely the compression ratio. With 64 kbit/s you
get the hifi quality you are used to receive with radio. With
128 kbis/s even experts have _very_ big problems to hear any
differences from the original CD data. The algorithm may be
implemented in real-time on a DSP and is in discussion
for being used in ISDN telephones.

The human ear has for each frequency a certain perception level.
If a sound is weaker than this level, you have no chance to recognize
it. A loud frequency component will cause the perception level
to raise for nearby frequencies. The algorithm uses
this effect in order to throw away the data you can't hear.

If you are interessted in this, I can ask the people hear
wether there are any English publications.

Markus

--
Markus Kuhn, Computer Science student -- University of Erlangen, Germany
E-mail: G=Markus;S=Kuhn;OU1=rrze;OU2=cnve;P=uni-erlangen;A=dbp;C=de

weigl@sibelius.inria.fr (Konrad Weigl) (05/05/91)

In article <1991Apr23.221537.21108@cc.tut.fi>, jk87377@cc.tut.fi (Juhana Kouhia) writes:
> 
> Have anyone researched possibility to use a fractal compression
> (IFS) to music.
> I can see that music contains a lot more selfsimilarities than
> pictures; for example listen to disco hit music.
> Same thing with some symphonies.
> 
> It might be possible compress a file by checking those 'same'
> waveforms and save one of them and store others as relatives to the
> saved waveform --


Look at newsgroup comp.fractals;
Otherwise, there are quite a few papers on IFS & time series out
there, I believe; you just have to look it up.

As far as music compression via "waveforms", look at
Gabor-transforms, or wavelet-decomposition:
some french are heavily into this, they call them "ondelettes"
which means wavelets.

Konrad Weigl               Tel. (France) 93 65 78 63
Projet Pastis              Fax  (France) 93 65 78 58
INRIA-Sophia Antipolis     email weigl@mirsa.inria.fr
2004 Route des Lucioles    
B.P. 109
06561 Valbonne Cedex
France