gda@creare.creare.UUCP (Gray Abbott) (10/24/89)
I just got back from the 1989 IEEE ASSP Workshop on Audio and Acoustics and thought I'd try to summarize what went on. The workshop was about 3 days long (starting Sunday night and ending Wed. afternoon) and consisted of 7 sessions, 3 invited talks, and 3 demo sessions. The technical sessions occurred sequentially, so everyone could attend everything. The workshop was at Mohonk Mountain House at Lake Mohonk in the Catskills (eat your hearts out). Sessions were: Echo Cancellation, Microphone Signal Processing Wideband Audio Coding Speech and Signal Processing Aids to the Handicapped Music Analysis and Synthesis Active Noise Control Room Acoustics, Binaural Reproduction, Testing Invited talks: J.B. Allen : Applications of Hearing Models to DSP R.W. Brodersen: Evolution of VLSI X. Rodet: Analysis/Synthesis Models for Music Demos: Wideband Audio Music Demos The opening talk was Jont Allen's, where he explained that almost everything we knew about hearing was wrong. He pointed out that the ear is full of non-linearities. One of the most interesting things he mentioned was that the outer hair cells on the Basilar membrane have been shown to be crucial to the sharp tuning characteristics of the cochlea, as shown in experiments involving "acoustic trauma", where damage to the inner hair cells did not effect membrane tuning (although it eliminated nerve response) while damage to outer hair cells ruined the tuning. The idea seems to be that the outer hair cells change their compliance in response to motion of the membrane. Anyway, Jont's point was that we need to use more accurate models of the cochlea if we're going to benefit from auditory models in speech coding and so on. The echo cancellation session was pretty interesting, to me. The main problem being addressed is one that has absorbed a lot of research time over the years: speaker phones. When you have two people talking to each other over speaker phones, you get all sorts of problems with speaker-microphone feedback and with room reverberation (hearing two rooms). The thing that I found interesting is that they seem to be getting close to a solution, using adaptive filtering algorithms, like LMS. These filters actually adapt in real-time to cancel feedback echoes on so forth. The problem seems to be that they take a long time to converge (~ 1 minute). Some of the papers dealt with subband algorithms, where the signal is broken up into frequency bands, on the theory that the individual bands will be easier to deal with than the fullband signal. The wideband coding session dealt with compressing music/high quality speech/data for storage or transmission. Several papers focused on using psychoacoustic/auditory models to determine what part of the signal is inaudible, due to masking, so that it can be discarded without perceptually harming the signal. Some interesting hardware and algorithms here, including a wave digital filter implementation a filterbank, by Ulrich Sauvagerd. The speech and signal processing session covered topics ranging from D/A hardware to noise reduction using sine-wave models (Quatieri and McAulay). The second inivited talk, by Brodersen, I missed, as I was the first speaker the next morning and needed my beauty sleep :-). It's in the Final Program. The aids for the handicapped included my talk on the AKL tactile aid for the deaf, several papers on testing for hearing aids, Levitt and Neuman's paper on using orthogonal polynomials to compress speech frequency data, and Pat Peterson's talk on microphone arrays for directional/steerable hearing aids. The AKL aid, in case you're wondering, turns speech into recognizable spatiotemporal patterns, which are then "displayed" on the skin of a profoundly deaf person. It actually works pretty well when combined with lip reading. Music Analysis/Synthesis covered all the ways to make "beeps", "boops", and more interesting sounds. I'm afraid these guys have a jargon of their own ("generator functions", etc.) and I didn't follow all of it. I liked their demos, though! Julius Smith, from Next, Inc. talked about using the MC56001 in the Next machine to generate music. Active noise control was interesting. I remember when this was mostly a fantasy. Now there are real applications in use, such as factory ducts which use microphones to monitor noise and then play cancellation signals through loudspeakers so no noise leaves the duct. High-speed DSP has made this possible, along with some hard-earned experience in microphone placement. One application is apparently to use the standard Hifi system in a car to cancel road noise at the passenger's ears, but this project was still "proprietary", so no details. Another application is in vibration control - keeping helicoptors from shaking themselves apart. Xavier Rodet's inivited talk on electronic music (as done at IRCAM, in Paris), but we were constantly being annoyed by a loud voice leaking into the room from below. It turned out to be Ted Koppel, reporting on the Bay Area earthquake, so after the talk we all retired to various public rooms to watch the news. The last session was on room acoustics, binaural reproduction, and testing. The hit of the session (and of the demos) was a paper given by Wenzel, Foster, Wightman, and Kistler, on a binaural simulation system. This is part of a fancy workstation project, which is to lead to multi-sensory displays for pilots, air traffic controller, etc. You put on these headphones and a little DSP box turns a one or more mono (?) signals into binaural signals, complete with pinna transformations. What's more, the headphones have a sensor, so that if you turn or tilt your head, it switches the transforms to keep the apparent sound sources in the same place, so you can actually "walk around them". I have to admit that it didn't work very well with me - the sounds were internalized, appearing between my ears. This was probably partly due to pinna mis-match (the pinnas filtering was set up in advance of the demo) and partly due to the presentation being anechoic. A little reverb would have helped. Which brings me to another paper, by Lehnert and Blauert, in which they completely simulate a concert hall, using ray tracing, and combine that with head and pinna transforms. Alas, this system is non-real-time (overnight processing on a PC/AT), but it's just what I always wanted to do...
jean@maxwell.Concordia.CA ( JEAN GOULET ) (10/25/89)
In article <GDA.89Oct23163049@creare.creare.UUCP> gda@creare.creare.UUCP (Gray Abbott) writes: > >The wideband coding session dealt with compressing music/high quality >speech/data for storage or transmission. Several papers focused on >using psychoacoustic/auditory models to determine what part of the >signal is inaudible, due to masking, so that it can be discarded without >perceptually harming the signal. Some interesting hardware and algorithms >here, including a wave digital filter implementation a filterbank, >by Ulrich Sauvagerd. Do you remember what kind of compression ratio they managed to achieve? I've been looking for ways of recording as much decent-quality audio into my limited RAM as possible, but the papers I've seen focus on minimizing speech bandwidth. They tend to remove as much information from the speech signal as they can get away with, until the words are barely comprehensible. I've only tried some simple algorithms which can reconstruct the input data exactly after expansion, rather than those which discard parts of the signal. I suppose that part of the reason for that is that I went through so much trouble at making my ADC as noise-free as possible that it would seem counterproductive to throw away parts of the ADC's output. Plus it's hard to anticipate the consequences of doing DSP on that kind of compressed data, since it will be likely that some frequency components will have been lost (take for example pitch-shifting a music instrument; while you might kill some frequency components because they're perceptually insignificant at the original pitch, that may not be the case after you shift to a new pitch...). > [...] Which brings me to another >paper, by Lehnert and Blauert, in which they completely simulate a concert >hall, using ray tracing, and combine that with head and pinna transforms. >Alas, this system is non-real-time (overnight processing on a PC/AT), but >it's just what I always wanted to do... I take it that the ray tracing you're talking about is for tracing acoustic waves, and not light waves, as in ray tracing for graphics. I'm asking because if they really wanted to recreate the concert hall experience, they'd have to do both kinds of ray tracing. Then they could do the Virtual Reality thing by fitting the user with a helmet having stereo headphones and stereo goggles, with each goggle lens displaying a high-quality color image of the scenery around the user depending on which direction their head is pointing. Guess we'll have to wait until they can fit a battery-operated supercomputer with Gigabytes (Terabytes?) of RAM in a walkman-sized box... Jean Goulet Electrical Engineering Class of '89 Concordia University Montreal, Canada