[comp.dsp] sound data compression

garton@cunixa.cc.columbia.edu (Bradford Garton) (09/24/89)

In article <7814@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes:
>Just by storing the *difference* between
>adjacent samples, and assuming that there are no impulses, a great
>savings in data can be achieved over storing 16 bit *absolute* values.

I've wondered about this before -- is this the technique known as
"delta modulation"?  I seem to recall some DACs made by dbx on the
market some years ago that employed such a scheme.  I gather one
of the problems was that errors would start being compounded.

Brad Garton
Columbia University Music Department
brad@woof.columbia.edu

ashok@atrp.mit.edu (Ashok C. Popat) (09/25/89)

In article <1900@cunixc.cc.columbia.edu> brad@woof.columbia.edu (Brad Garton) writes:
>In article <7814@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes:
>>Just by storing the *difference* between
>>adjacent samples, and assuming that there are no impulses, a great
>>savings in data can be achieved over storing 16 bit *absolute* values.
>
>I've wondered about this before -- is this the technique known as
>"delta modulation"?

I think it is called "differential PCM" or "DPCM" (actually, it is a
special case of DPCM --- first-order DPCM with a unit prediction
coefficient).  The term "delta modulation" usually refers to the
special case of first-order DPCM in which the prediction error is
quantized with exactly one bit (i.e., two quantization levels).

>[...]                    I gather one
>of the problems was that errors would start being compounded.

The trick is to use the difference between the current sample and the
previous reconstructed sample (not the previous original sample).
This prevents the quantization error from building up (in the case of
error-free channels or media).

In general, the previous reconstructed sample is multiplied by a
"prediction coefficient" before the difference is computed.

In the case of error-prone channels (or media), single bit-errors will
propagate over successive samples in DPCM.  For this reason, in
first-order DPCM, the prediction coefficient should be chosen to be
slightly less than unity, so that the decoding system will have its
pole lie strictly inside the unit circle.  This will allow the effects
of transmission/storage errors to die away.

Actually, there is another reason to choose a prediction coefficient
of less than unity.  It turns the a good approximation to the
coefficient value that minimizes overall mean-square error for
wide-sense stationary (WSS) processes is simply the value of the
autocorrelation at lag 1.  When 8000-Hz sampled speech is modeled as
WSS, this value is usually taken to be around 0.8.

Ashok Chhabedia Popat  MIT Rm 36-665  (617) 253-7302

jj) (09/25/89)

In article <1900@cunixc.cc.columbia.edu> brad@woof.columbia.edu (Brad Garton) writes:
>In article <7814@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes:
>>Just by storing the *difference* between
>
>I've wondered about this before -- is this the technique known as
>"delta modulation"?  I seem to recall some DACs made by dbx on the
>market some years ago that employed such a scheme.  I gather one
>of the problems was that errors would start being compounded.
The scheme is called "Differential PCM", and delta modulation
is a sub-set of the basic technique.

Jayant and Noll, "Digital Coding of Waveforms", has a rather complete,
although tough-mathwise, discussion of DPCM, and of various variants
of DPCM, including Delta Modulation.  Ray Steele has an older
book out on Delta Modulation alone, I don't remember the title
off the top of my head.
-- 
To the Lords of   *Mail to jj@alice.att.com  or alice!jj
Convention        *HASA, Atheist Curmudgeon Division
'Twas Claverhouse *Copyright alice!jj 1989, all rights reserved, except
Spoke             *transmission by USENET and like free facilities granted.

dnwiebe@cis.ohio-state.edu (Dan N Wiebe) (09/26/89)

	The technique you speak of is properly called "Differential Pulse
Code Modulation" (at least it is in Tanenbaum's networking book).  The problem
is not compounded error, at least not with audio (a signal that varies
between -3V and 3V will sound exactly the same if you give it a positive bias
of one volt so that it varies between -2V and 4V, provided no component limits
are exceeded).  The problem is that when you are dealing either with sharp
spikes or high frequencies, the compression can't keep up with it.  As a
simple example, consider the case of a 16-bit environment where you have an
instantaneous square transition from 0 to 65535.  If you're sending, say,
eight-bit deltas, for a 50% compression ratio, you can only go up by 256 at
a time, and it'll take you 256 sample times to get clear to 65535, rather than
just one if you use 16-bit absolute values.  This is admittedly an extreme
example, and real-world examples of distortion based on this are much less
severe; enough less, apparently, that it's considered a viable compression
method...
	It seems to me that a viable alternative might be to have a protocol
where you used a seven-bit delta, with a high bit of zero, in most cases, and
if you needed an absolute number, you could send two bytes, where the high
bit of the first one was 1 and the rest of the bits were the absolute value
shifted right one bit.  Of course, this means an irregular data-transfer
rate, which means buffering, and some intermediate sample processing (which,
though, is simple enough to be done quickly by hardware), and maybe that's
too much trouble to go to for a mere 50.5% (or whatever) compression ratio;
I don't know.  Depending on the sample width, the specifics could be changed
to increase the compression ratio.
	As a matter of fact, this could be viewed as a form of adaptive
compression, and possibly could be extended to more than two levels (say, a
two-bit delta, a six-bit delta, a ten-bit delta, or a twelve-bit delta,
whichever was the smallest you could get away with based on the highest
frequency component--that would give you a 16-bit sample space with
four-bit, eight-bit, twelve-bit, or sixteen-bit samples, where most would
be four- and eight-bit and only a few would be sixteen-bit, resulting
possibly in an average compression ratio of between 50% and 75%--I don't know
enough to do the statistics).  The actual average compression ratio could be
easily worked out if I had access to a 'typical' piece of digitized sound...
	I'm probably reinventing the wheel here; has anybody heard any of this
before?