[net.music.synth] Audio Data Reduction Techniques

ed@plx.UUCP (10/07/86)

	Sorry, no knowledge to impart. Only questions...

	In addition to the obvious solution of storing the delta
	between samples, what other techniques can be used to
	cut down on disk storage? I would like to assume that 
	much of the sound being recorded is aperiodic so that 
	"table scanning" would be useless.  Also, I want 14 to 16
	bit quantitization.

	Any Ideas?


	Ed Chaban
	Plexus Computers Inc.
	Phone: (408) 943-2226
	Net: sun!plx!ed

jaw@aurora.UUCP (James A. Woods) (10/09/86)

#  Problems worthy of attack, prove their worth by hitting back -- Piet Hein

     Since optimal compression of arbitrary information was shown
unsolvable by Kolmogorov in the sixties (and independently
by Gregory Chaitin [then a high school student] in J. ACM, 1964),
ad hoc suboptimal techniques have commanded attention.

     For noiseless coding, the NP-complete generalization of Lempel-Ziv
coding by J. Storer doesn't do us much good, but LZ coding itself
(Unix "compress") is asymptotically optimal for stationary correlated
sources and runs in linear time.  Problem is, audio is not too stationary,
and the finitude of the compress code dictionary undermines its
"in-the-limit" theoretical advantage.

     For non-information preserving coding, getting a handle on the
"eigencharacteristics" of audio (or video, for that matter) is a bit easier
when transformed into another domain.  The Karhunen-Loeve transformation is
optimal in the mean-square-error sense but can require cubic time.
The Fast Fourier transform has served well for signal source coding,
where compression is achieved across blocks by assigning fewer bits to less
energetic coefficients before inverse transform.  Frequencies the ear does not
find important during the transform interval may also be zeroed (masked).
For more compression, phase may be quantized to a lesser extent.
Difference methods might be applied adaptively across blocks; of course,
any residue may be coded noiselessly with run-length, Huffman, or LZ
techniques.  Progressive filtering methods (where low-order approximations
to a signal are successively refined) ala the work of Andrew Burt
are all the rage in video coding now.  These at least provide a framework
for handling some real-time buffering concerns.

     Bell Labs has, for several years, investigated an approximation to the
above short-time "filter bank" processing they call adaptive subband coding,
which their current technology can run in real-time to squeeze 64 kbps speech
to somewhat less than 10 kbps with minimal degradation.  Ask alice!jj
about such, or consult the literature (IEEE Trans. Acoustics ...)

     Other companies have been known to proffer video codecs with rates as
low as 56kps, with concomitant quality tradeoffs.  Before its demise a couple
years back, an outfit named Compusonics claimed compression of 45 minutes
of stereo onto a 3.3 megabyte Kodak floppy disk using a pattern recognition
method whose patent was outlined in New Scientist.  Their exaggerated claims 
(coupled with imminent introduction of read/write optics, DAT, etc. in Japan)
no doubt contributed to its short lifespan.

     Applications for compressed audio on nonmechanical media certainly
exist (e.g. in toys), and I would not be surprised to learn that music
distributors soon might download compressed audio to digital jukeboxes.

     -- James A. Woods (ames!jaw)