garton@cunixa.cc.columbia.edu (Bradford Garton) (09/24/89)
In article <7814@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes: >Just by storing the *difference* between >adjacent samples, and assuming that there are no impulses, a great >savings in data can be achieved over storing 16 bit *absolute* values. I've wondered about this before -- is this the technique known as "delta modulation"? I seem to recall some DACs made by dbx on the market some years ago that employed such a scheme. I gather one of the problems was that errors would start being compounded. Brad Garton Columbia University Music Department brad@woof.columbia.edu
ashok@atrp.mit.edu (Ashok C. Popat) (09/25/89)
In article <1900@cunixc.cc.columbia.edu> brad@woof.columbia.edu (Brad Garton) writes: >In article <7814@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes: >>Just by storing the *difference* between >>adjacent samples, and assuming that there are no impulses, a great >>savings in data can be achieved over storing 16 bit *absolute* values. > >I've wondered about this before -- is this the technique known as >"delta modulation"? I think it is called "differential PCM" or "DPCM" (actually, it is a special case of DPCM --- first-order DPCM with a unit prediction coefficient). The term "delta modulation" usually refers to the special case of first-order DPCM in which the prediction error is quantized with exactly one bit (i.e., two quantization levels). >[...] I gather one >of the problems was that errors would start being compounded. The trick is to use the difference between the current sample and the previous reconstructed sample (not the previous original sample). This prevents the quantization error from building up (in the case of error-free channels or media). In general, the previous reconstructed sample is multiplied by a "prediction coefficient" before the difference is computed. In the case of error-prone channels (or media), single bit-errors will propagate over successive samples in DPCM. For this reason, in first-order DPCM, the prediction coefficient should be chosen to be slightly less than unity, so that the decoding system will have its pole lie strictly inside the unit circle. This will allow the effects of transmission/storage errors to die away. Actually, there is another reason to choose a prediction coefficient of less than unity. It turns the a good approximation to the coefficient value that minimizes overall mean-square error for wide-sense stationary (WSS) processes is simply the value of the autocorrelation at lag 1. When 8000-Hz sampled speech is modeled as WSS, this value is usually taken to be around 0.8. Ashok Chhabedia Popat MIT Rm 36-665 (617) 253-7302
jj) (09/25/89)
In article <1900@cunixc.cc.columbia.edu> brad@woof.columbia.edu (Brad Garton) writes: >In article <7814@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes: >>Just by storing the *difference* between > >I've wondered about this before -- is this the technique known as >"delta modulation"? I seem to recall some DACs made by dbx on the >market some years ago that employed such a scheme. I gather one >of the problems was that errors would start being compounded. The scheme is called "Differential PCM", and delta modulation is a sub-set of the basic technique. Jayant and Noll, "Digital Coding of Waveforms", has a rather complete, although tough-mathwise, discussion of DPCM, and of various variants of DPCM, including Delta Modulation. Ray Steele has an older book out on Delta Modulation alone, I don't remember the title off the top of my head. -- To the Lords of *Mail to jj@alice.att.com or alice!jj Convention *HASA, Atheist Curmudgeon Division 'Twas Claverhouse *Copyright alice!jj 1989, all rights reserved, except Spoke *transmission by USENET and like free facilities granted.
dnwiebe@cis.ohio-state.edu (Dan N Wiebe) (09/26/89)
The technique you speak of is properly called "Differential Pulse Code Modulation" (at least it is in Tanenbaum's networking book). The problem is not compounded error, at least not with audio (a signal that varies between -3V and 3V will sound exactly the same if you give it a positive bias of one volt so that it varies between -2V and 4V, provided no component limits are exceeded). The problem is that when you are dealing either with sharp spikes or high frequencies, the compression can't keep up with it. As a simple example, consider the case of a 16-bit environment where you have an instantaneous square transition from 0 to 65535. If you're sending, say, eight-bit deltas, for a 50% compression ratio, you can only go up by 256 at a time, and it'll take you 256 sample times to get clear to 65535, rather than just one if you use 16-bit absolute values. This is admittedly an extreme example, and real-world examples of distortion based on this are much less severe; enough less, apparently, that it's considered a viable compression method... It seems to me that a viable alternative might be to have a protocol where you used a seven-bit delta, with a high bit of zero, in most cases, and if you needed an absolute number, you could send two bytes, where the high bit of the first one was 1 and the rest of the bits were the absolute value shifted right one bit. Of course, this means an irregular data-transfer rate, which means buffering, and some intermediate sample processing (which, though, is simple enough to be done quickly by hardware), and maybe that's too much trouble to go to for a mere 50.5% (or whatever) compression ratio; I don't know. Depending on the sample width, the specifics could be changed to increase the compression ratio. As a matter of fact, this could be viewed as a form of adaptive compression, and possibly could be extended to more than two levels (say, a two-bit delta, a six-bit delta, a ten-bit delta, or a twelve-bit delta, whichever was the smallest you could get away with based on the highest frequency component--that would give you a 16-bit sample space with four-bit, eight-bit, twelve-bit, or sixteen-bit samples, where most would be four- and eight-bit and only a few would be sixteen-bit, resulting possibly in an average compression ratio of between 50% and 75%--I don't know enough to do the statistics). The actual average compression ratio could be easily worked out if I had access to a 'typical' piece of digitized sound... I'm probably reinventing the wheel here; has anybody heard any of this before?