[comp.dsp] ASSP Workshop on Audio and Acoustics

gda@creare.creare.UUCP (Gray Abbott) (10/24/89)

I just got back from the 1989 IEEE ASSP Workshop on Audio and Acoustics
and thought I'd try to summarize what went on.  The workshop was about
3 days long (starting Sunday night and ending Wed. afternoon) and consisted
of 7 sessions, 3 invited talks, and 3 demo sessions.  The technical sessions
occurred sequentially, so everyone could attend everything.  The workshop
was at Mohonk Mountain House at Lake Mohonk in the Catskills (eat your hearts
out).

Sessions were:

	Echo Cancellation, Microphone Signal Processing
	Wideband Audio Coding
	Speech and Signal Processing
	Aids to the Handicapped
	Music Analysis and Synthesis
	Active Noise Control
	Room Acoustics, Binaural Reproduction, Testing

Invited talks:

	J.B. Allen : Applications of Hearing Models to DSP
	R.W. Brodersen: Evolution of VLSI
	X. Rodet: Analysis/Synthesis Models for Music

Demos:

	Wideband Audio
	Music Demos

The opening talk was Jont Allen's, where he explained that almost
everything we knew about hearing was wrong.  He pointed out that
the ear is full of non-linearities.  One of the most interesting
things he mentioned was that the outer hair cells on the Basilar
membrane have been shown to be crucial to the sharp tuning
characteristics of the cochlea, as shown in experiments involving
"acoustic trauma", where damage to the inner hair cells did
not effect membrane tuning (although it eliminated nerve response)
while damage to outer hair cells ruined the tuning.  The idea
seems to be that the outer hair cells change their compliance
in response to motion of the membrane.  Anyway, Jont's point was
that we need to use more accurate models of the cochlea if we're
going to benefit from auditory models in speech coding and so on.

The echo cancellation session was pretty interesting, to me.  The
main problem being addressed is one that has absorbed a lot of research
time over the years: speaker phones.  When you have two people talking
to each other over speaker phones, you get all sorts of problems with
speaker-microphone feedback and with room reverberation (hearing two
rooms).  The thing that I found interesting is that they seem to be
getting close to a solution, using adaptive filtering algorithms, like
LMS.  These filters actually adapt in real-time to cancel feedback echoes
on so forth.  The problem seems to be that they take a long time to converge
(~ 1 minute).  Some of the papers dealt with subband algorithms, where
the signal is broken up into frequency bands, on the theory that the
individual bands will be easier to deal with than the fullband signal.

The wideband coding session dealt with compressing music/high quality
speech/data for storage or transmission.  Several papers focused on
using psychoacoustic/auditory models to determine what part of the
signal is inaudible, due to masking, so that it can be discarded without
perceptually harming the signal.  Some interesting hardware and algorithms
here, including a wave digital filter implementation a filterbank,
by Ulrich Sauvagerd.

The speech and signal processing session covered topics ranging from D/A
hardware to noise reduction using sine-wave models (Quatieri and McAulay).

The second inivited talk, by Brodersen, I missed, as I was the first speaker
the next morning and needed my beauty sleep :-).  It's in the Final Program.

The aids for the handicapped included my talk on the AKL tactile aid
for the deaf, several papers on testing for hearing aids, Levitt and
Neuman's paper on using orthogonal polynomials to compress speech
frequency data, and Pat Peterson's talk on microphone arrays for
directional/steerable hearing aids.  The AKL aid, in case you're
wondering, turns speech into recognizable spatiotemporal patterns,
which are then "displayed" on the skin of a profoundly deaf person.
It actually works pretty well when combined with lip reading.

Music Analysis/Synthesis covered all the ways to make "beeps", "boops",
and more interesting sounds.  I'm afraid these guys have a jargon of their
own ("generator functions", etc.) and I didn't follow all of it.  I liked
their demos, though!  Julius Smith, from Next, Inc. talked about using the
MC56001 in the Next machine to generate music.

Active noise control was interesting.  I remember when this was mostly a
fantasy.  Now there are real applications in use, such as factory ducts
which use microphones to monitor noise and then play cancellation signals
through loudspeakers so no noise leaves the duct.  High-speed DSP has
made this possible, along with some hard-earned experience in microphone
placement.  One application is apparently to use the standard Hifi system
in a car to cancel road noise at the passenger's ears, but this project
was still "proprietary", so no details.  Another application is in vibration
control - keeping helicoptors from shaking themselves apart.

Xavier Rodet's inivited talk on electronic music (as done at IRCAM, in Paris),
but we were constantly being annoyed by a loud voice leaking into the room
from below.  It turned out to be Ted Koppel, reporting on the Bay Area
earthquake, so after the talk we all retired to various public rooms to watch
the news.

The last session was on room acoustics, binaural reproduction, and testing.
The hit of the session (and of the demos) was a paper given by Wenzel,
Foster, Wightman, and Kistler, on a binaural simulation system.  This is
part of a fancy workstation project, which is to lead to multi-sensory
displays for pilots, air traffic controller, etc.  You put on these headphones
and a little DSP box turns a one or more mono (?) signals into binaural
signals, complete with pinna transformations.  What's more, the headphones
have a sensor, so that if you turn or tilt your head, it switches the
transforms to keep the apparent sound sources in the same place, so you
can actually "walk around them".  I have to admit that it didn't work
very well with me - the sounds were internalized, appearing between my
ears.  This was probably partly due to pinna mis-match (the pinnas filtering
was set up in advance of the demo) and partly due to the presentation being
anechoic.  A little reverb would have helped.  Which brings me to another
paper, by Lehnert and Blauert, in which they completely simulate a concert
hall, using ray tracing, and combine that with head and pinna transforms.
Alas, this system is non-real-time (overnight processing on a PC/AT), but
it's just what I always wanted to do...

jean@maxwell.Concordia.CA ( JEAN GOULET ) (10/25/89)

In article <GDA.89Oct23163049@creare.creare.UUCP> gda@creare.creare.UUCP (Gray Abbott) writes:
>
>The wideband coding session dealt with compressing music/high quality
>speech/data for storage or transmission.  Several papers focused on
>using psychoacoustic/auditory models to determine what part of the
>signal is inaudible, due to masking, so that it can be discarded without
>perceptually harming the signal.  Some interesting hardware and algorithms
>here, including a wave digital filter implementation a filterbank,
>by Ulrich Sauvagerd.

Do you remember what kind of compression ratio they managed to achieve?
I've been looking for ways of recording as much decent-quality audio into
my limited RAM as possible, but the papers I've seen focus on minimizing speech
bandwidth.  They tend to remove as much information from the speech
signal as they can get away with, until the words are barely comprehensible.
I've only tried some simple algorithms which can reconstruct the input data
exactly after expansion, rather than those which discard parts of the
signal.  I suppose that part of the reason for that is that I went through
so much trouble at making my ADC as noise-free as possible that it would
seem counterproductive to throw away parts of the ADC's output.  Plus it's
hard to anticipate the consequences of doing DSP on that kind of compressed
data, since it will be likely that some frequency components will have been
lost (take for example pitch-shifting a music instrument; while you might kill
some frequency components because they're perceptually insignificant at the
original pitch, that may not be the case after you shift to a new pitch...).

>       [...]                                   Which brings me to another
>paper, by Lehnert and Blauert, in which they completely simulate a concert
>hall, using ray tracing, and combine that with head and pinna transforms.
>Alas, this system is non-real-time (overnight processing on a PC/AT), but
>it's just what I always wanted to do...

I take it that the ray tracing you're talking about is for tracing acoustic
waves, and not light waves, as in ray tracing for graphics.  I'm asking  
because if they really wanted to recreate the concert hall experience, they'd
have to do both kinds of ray tracing.  Then they could do the Virtual Reality
thing by fitting the user with a helmet having stereo headphones and stereo
goggles, with each goggle lens displaying a high-quality color image of the
scenery around the user depending on which direction their head is pointing.
Guess we'll have to wait until they can fit a battery-operated supercomputer
with Gigabytes (Terabytes?) of RAM in a walkman-sized box...

                                             Jean Goulet
                                             Electrical Engineering
                                             Class of '89
                                             Concordia University
                                             Montreal, Canada