[comp.dsp] Psychoacoustics

gvokalek@augean.OZ (George Vokalek) (11/08/89)

> What happens when the listener's head turns?  What you really need is a pair of
> headphones that can detect head movements.  Then you must adjust the delays and

The ear is a tube with a transducer at one end.  If the sound pattern is
such that a node is present at the eardrum, no sound will be heard.

By turning the head, you change the pressure distribution in the ear canal,
moving away from the node (possibly toward another node at a different
frequency).

Given the speed of sound is 330m/s, a 3kHz sound will have a wavelength
of about 10cm.  Moving the head by several cm therefore represents a
significant fraction of one wavelength, resulting in a significantly
different sound pattern in the ear.

Note that if this means that you should be able to localise high
frequency sound more accurately than low frequency sound.  Personally,
this seems reasonable - for instance its easy to find a mosquito.
I cant think of any Low freq examples.


..G..

pvo3366@sapphire.OCE.ORST.EDU (Paul O'Neill) (11/13/89)

In article <1989Nov2.180644.28647@sj.ate.slb.com> greg@sj.ate.slb.com (Greg Wageman) writes:

>I don't think this is possible with two speakers in front of the
>person.  ..........

Wrongo.  See below.

>I'm by no means an expert, but my understanding is that the brain
>localizes sound by analyzing the amplitude (volume) and time-delay
>(phase) of direct and reflected sounds as percieved by both ears to ....

You've mentioned 2 mechanisms of localization, but not no. 3.

1]	Amplitude
2]	Phase
3]	Frequency response

Our ears have a different frequency response at different azimuths and 
elevations.  We use the frequency content of arriving sounds as one of
our localization inputs.

Demonstration:  Plug one ear with your hand.  Close your eyes.  Click
the fingernails of your thumb and forefinger on the other hand at 
various positions around your other ear.  Can you localize the clicks?
How?  You're only using one ear.

Ignoring 1] and 2], and building circuits exploiting 3] only can give
surprisingly good and adjustable stereo imaging -- with headphones or
with speakers.  One starts with the known frequency response curves of
the human ear for various azimuths the the known azimuth of the speakers
or headphones.  Filter the sound source such that when it is run through
the ear's "response at Speaker-Azimuth"  it will have the content of
"response at Faked-Azimuth".

I'll see if I can dig out my references on this stuff.  Most is from
Journal of the Acoustical Society of America. (JASA)

Paul O'Neill                 pvo@oce.orst.edu
Coastal Imaging Lab
OSU--Oceanography
Corvallis, OR  97331         503-754-3251

rob@kaa.eng.ohio-state.edu (Rob Carriere) (11/14/89)

In article <13729@orstcs.CS.ORST.EDU> pvo3366@sapphire.OCE.ORST.EDU (Paul
O'Neill) writes: 
>Our ears have a different frequency response at different azimuths and 
>elevations.  We use the frequency content of arriving sounds as one of
>our localization inputs.
>
>Demonstration:  Plug one ear with your hand.  Close your eyes.  Click
>the fingernails of your thumb and forefinger on the other hand at 
>various positions around your other ear.  Can you localize the clicks?
>How?  You're only using one ear.

I agree with the content of the post, but the demonstration is bogus.  Quite
apart from any audio clues as to the location of the sound, you have the clues
arising from the fact that you know where your fingers are.  There is no
obvious way to tell whether or not this extra information is used to
``cheat''. 

A proper demonstration (which works quite well, incidentely) would be to have
a second person produce the sounds.

SR

brianw@microsoft.UUCP (Brian Willoughby) (11/15/89)

>Demonstration:  Plug one ear with your hand.  Close your eyes.  Click
>the fingernails of your thumb and forefinger on the other hand at 
>various positions around your other ear.  Can you localize the clicks?
>How?  You're only using one ear.
>
>Paul O'Neill                 pvo@oce.orst.edu

Are you satisfied that plugging one ear with your hand is TOTALLY
blocking sound from entering that ear?  I'm not.  Try disconnecting your
auditory nerve :-)

The following is a followup that I composed to the original posting.  I
believe my system had trouble sending it.  If this is a duplicate, please
ignore it (it's very long).  I don't mention many DSP techniques, but
once the concepts are revealed its almost trivial to think of DSP
applications.  I also don't mention the effects of frequency response -
I'll leave that to someone who knows a little more about how sounds are
filtered by the irregular shape of the outer ear.
-------------------------------------------------------------------------

In article <1989Oct31.193130.1685@eddie.mit.edu> rich@eddie.mit.edu (Richard Caloggero) writes:
>     Is anyone out there interested in talking about
>     psycho-acoustics/psycho-acoustic phenomena?
[...]
>                     As it turns out, most of this information is
>spacial in nature.  Psychoacoustics, then, is the study of phenomena
>related to the *realness* of sound.  This *realness* is not only a
>function of frequency response, frequency balance, harmonic
>distortion(s), etc.  It is also a function of things having to do with
>spacial information.  My problem is that I have no name for these
>*spacial things*.  I've heard  them, but I can't talk about my
>experience in concrete terms without the necessary vocabulary.

I think that psycho-acoustics is very relevant to this group because DSP
techniques make it much easier to simulate realistic sound.  It is
currently possible to do a lot of experimenting with sound space using
today's technology, but there is much room for improvement.

>      Can anyone out there shed some light on this muck?  I am
>convinced that it is possible to build a *box* which can take one
>channel and generate a stereo signal which is a representation of the
>original signal plus 3d positional information.  In other words, I can
>generate sound, using just two speakers, which actually comes from
>*behind* the listener even if he/she is facing the speakers.  Does
>anyone agree with this?  Does  anyone know how to build such a thing?

How your brain locates sound sources:

I think that it is very possible with a stationary listener, or a system
which adjusts to the listener's changing position.  (Say, like
headphones?).  In fact, one of my pet peeves is that hi-fi audio salesmen
have been selling *monophonic* subwoofers for years.  They claim that low
frequencies cannot be located by your brain because they are non-
directional.  Actually, your brain uses (at least) two methods of
locating sound sources: delay and amplitude.  Low frequencies are located
by the delay between the arrival of the sound to each ear.  The fact that
low frequencies are less directional helps by presenting approximately
equal amplitudes to each ear, only phase shifted.  Sounds directly ahead
(or behind) arrive simultaneously, while a sound emanating from 90% to
the left or right have the maximum delay, based on the distance between
your ears, the speed of sound, and the period of the frequency.  High
frequencies are more directional (directionality of sound increases with
pitch), and your ear locates higher tones by the difference in amplitude
between each ear.  As with phase shift, amplitude difference is smallest
when straight ahead, and largest to the left or right.  Basically your
head gets in the way of the directional highs, thus lowering the
amplitude as the angle increases.

Why the brain uses two methods:

Both methods of directional sensing are used together because each breaks
down at different frequencies.  Delay, or phase shift, can only be used
for low frequencies, because as you increase pitch the period of the wave
gets smaller and eventually the wavelength is less than the distance
between your ears.  Your brain can determine the delay when comparing two
versions of the same cycle of a wave, but things get muddy and confused
if the sound completes an entire period in one ear before it arrives at
the other.  Amplitude differences start to diminish with lower pitches
because directionality is also decreasing and eventually there is very
little difference in volume.  The moral of the story is that, in truth,
the *mid range* speakers cannot be located by your ears and brain.

Binaural Recording:

Binaural recording was based on recording sound from two microphones
which were the same distance apart as the average persons ears (no jokes
about cranium size, please), thus preserving the natural time delays
between the sound 'images' presented to each ear.  A good analogy is 3D
movies, which record two light images with cameras spaced horizontally
like our eyes.  When the two images are delivered independantly to each
eye (using polarized lenses set at 90 degree angles), the brain
interprets depth that is no longer real.  For some reason binaural
recording is not as popular as it was, even though we probably have the
processing power to synthesize the effect rather than merely recording it
accurately.

Multi-Monophonic Recording:

Standard audio mixers pan sound using volume only.  There was an article
this year in Electronic Musician magazine which referred to this as
'multi-monophonic recording', which is more accurate.  The result is a
distinct lack of 'spacial' clues.  This is probably why mono subwoofers
sell.  The funny thing is that many audiophiles listen to classical music
which doesn't suffer from volume panning since it is recorded via
microphone.  The EM article described how incorrect micing could destroy
phase shift clues (if the mics were spaced too far - often on opposite
sides of the stage) or destroy amplitude cues.  Proper micing results in
good imaging.  Have you ever donned your headphones and panned the sound
all the way to the extreme left or right?  This really bothers me (gives
me a kind of a small headache :-), because my brain is trying to compute
what location this sound could be arriving from such that the other ear
would hear nothing at all.  Just doen't fit into the algorithm.

Drawbacks to 3D audio:

Headphones and loudspeakers are two differect media, yet we pipe the same
source material through them.  Perfect binaural recordings are ruined
when auditioned over free-standing speakers, because each ear hears
*both* channels.  Your brain re-interprets the delays and amplitude
differences and ends up computing WHERE THE SPEAKERS ARE!  It is possible
to make a recording which sounds 3D over loudspeakers, but the effect
would be different through headphones.

How can you experiment with psycho-acoustics today?

As the EM article mentioned, digital synthesizers can precisely repeat
the same sound based on an algorithm.  If a stereo synth (or two mono
synths connected through MIDI) were programmed so that each channel had a
different delay, then you could create 3D soundscapes.  One of the first
things I tried with my EPS (a year before I read the article, BTW) was to
pan two copies of the same digital sample so that one was hard left and
the other was hard right.  This took very little extra memory, since the
data was shared between the two voices.  I set one channel to ignore the
pitch bender so I could change the time delay in real time.  Think of an
analogy to a record player.  If two identical records are playing on
turntables with matched speeds, then the sounds will be in sync.
Grabbing one platter and slowing that channel down for an instant before
letting go would make that record lag with a slight delay after it
returned to normal speed.  The pitch bender on the sampler did the same
so I was able to move a sound around the room (without a volume change)
and leave it there by releasing the pitch bender to standard speed.  If
I needed to make the pitch bender channel advance ahead of the channel
which was ignoring the bender, then moving the pitch up for an instant
and then releasing it would do the trick.  This was pretty cumbersome,
but there is much you can do with this idea.  Many digital delay
processors can listen to MIDI controllers and change their delay in real
time.  If the programmable EPS MIDI controllers were used to affect the
volume of each wave, then one of these delay processors could cause the
phase shift to track along with volume changes.  I sure hope there are
a few synth owners reading comp.dsp

Carver makes a 'Sonic-Hologram' device which tricks your brain into
treating loudspeakers like headphones.  i.e. each ear only hears the
respective channel.  By computing the distance to the optimum listener
position, requiring the speakers to be placed a certain distance apart
and allowing for the speed of sound, Carver delays each channel the
correct amount, inverts the signal's polarity and mixes it with the
opposite channel.  This results in any sound from the *right* speaker
which hits the *left* ear being canceled out by an inverted copy of the
same wave.  The effect is so easy to accomplish that many portables have
a 'stereo wide' switch - although how you could fully appreciate the
effect while both you and the box are moving is beyond me...

Drawback #2:

Any 3D loudspeaker system will be dependant upon the position of the
listener (I think) unless someone designs an adaptive system which
monitors and adjusts to your movements.  Imagine one of those flight
simulator helmets with headphones and a computer which relocates
objects as you turn and walk around (such a thing is in the works,
but it still won't solve the free-standing loudspeaker problem).

Other experimental ideas:

Back when digital delays started to become affordable, I dreamed up a
multi-channel mixer system which had an independant adjustable delay for
each channel which tracked the pan pot.  The delay would be set so that
either channel could be delayed with respect to the other.  Thus, both
amplitude and phase would change is unison - as they do when an object
moves around you.  I thought this would be an expensive device, but I was
thinking of combining an analog mixer and digital delay.  With a totally
digital mixer, it would be interesting to allow very short time delays
in order to synthesize a binaural-style recording.

Other areas:

My experience has only been with time delays and amplitude changes.  I
wonder how much processing your brain does to reverse compute the
changes in the sound as it passes through our irregularly shaped outer
ear.  i.e. is it possible that a sound from behind is distinguished
by the path it takes around your outer ear?  I have read of an individual
who was convinced that he could coax stereo sound out of a monophonic
speaker!  I assume that he thought he knew how the brain interprets
frequency info.  He released a few recordings, perhaps someone else knows
the name of this person?

>     How do you use your ears?  Most people use them for spoken
>communication, and listening to music.  How many out there use them for
>navigation?  Since I am blind, I use audio information to *see* my
>environment.

I have often been walking in the dark down a familiar hallway, where I
was expecting a doorway at some distance, only to stop a few feet short
because I *thought* that I was at the door already.  It is usually around
two feet short of the door, so I figure my unconscious is taking in clues
and warning me to stop at a safe distance.  I've wondered if these
'clues' were sound-based, but sometimes there is faintly detectable
light.  Don't ask me why I occasionally walk around in the dark...

>-- 
>						-- Rich (rich@eddie.mit.edu).

Brian Willoughby
UUCP:           ...!{tikal, sun, uunet, elwood}!microsoft!brianw
InterNet:       microsoft!brianw@uunet.UU.NET
  or:           microsoft!brianw@Sun.COM
Bitnet          brianw@microsoft.UUCP

martin@cod.NOSC.MIL (Douglas W. Martin) (11/16/89)

     In the past few weeks, there have been several articles
about human perception of sound, and how such
psychoacoustic cues are interpreted spatially.
The two areas I wish to address are those of binaural recording,
and the obstacle detection sense used by many blind people
for navigation.  I myself am totally blind,
use this obstacle sense extensively, and have a MS
in acoustics from Penn State.
     The ability to detect obstacles, find doorways, estimate the size of
rooms, etc., was first discussed in the literature by
D. Diderot in 1749.  He thought that a blind person could
judge the proximity of bodies by "the action of air on his face."
The sensation of approaching an obstacle is somewhat like light
pressure on the face.  Thus, this sense has been misnamed
"facial vision" in much of the early literature.
     The obstacle sense is more accurately referred to as
echolocation; it is an auditory perception.   Confirmation that this is an
auditory sense and not some kind of "facial vision" was first
obtained by researchers at Cornell University in the late
1940's.  Obstacles can be detected either
passively (using reflections of ambient sound in the room) or
actively (using a self-generated noise such as a click or
whistle).  Learning to use this echolocation is
sudden and insightful rather than gradual; a person merely needs to learn
what to listen for.  Of course, the use of this perception
is not limited to blind people.  Anyone can easily demonstrate
this perception.  Simply close your eyes, and walk with hard shoes
on a hard floor toward a wall.  You should sense the presence of the
wall before actually contacting it.  However, if walking barefoot across a
carpeted floor, will usually result in impact with the wall, because
there is much less reflected sound to work with.

     Many of the parameters of this echo detection capability were
quantified by Charles Rice and his  colleagues at Stanford in
the mid and late 1960's.  The ability to detect an object depends on its
size, distance, and reflectivity.  Rice found that blind people could
detect obstacles spanning an angle of about four degrees.
Area ratios between disks as small as 1.06 to one, could be discriminated.
Some subjects could also reliably discriminate circles, squares, and
triangles using their echolocation.  Large obstacles can be detected at 
distances exceeding ten to fifteen metres.   Distance cues appear to be related to
both pitch and loudness, and directional cues result from the
same auditory localization phenomena described by
earlier articles in this group, mainly interaural
time and amplitude differences.


     It was mentioned in an earlier article that binaural
recordings can be made by separating two microphones
by a distance equal to the diameter of the head.  Actually, this
is not sufficient to make a binaural recording; it will
only make a stereo recording.  In order to obtain the
binaural effect of localization, an obstacle (like a head) must be
present between the microphones.  This is necessary to create the auditory 
shadow which is critical for high-frequency localization.
When listening to a stereo recording through headphones, the sound
image is "lateralized" as opposed to the image
being "localized" using headphones with a true binaural recording.
In lateralization, with stereo headphones, the sound image appears
to be coming from somewhere inside the head, often closer to
either left or right, but still within the head.
When listening with headphones to a true binaural
recording, the sound image is "out there in space" with a
perceived distance and direction.  Again, to make a binaural
recording, it is necessary to have a head-sized obstacle
between the microphones.  The actual shape of the obstacle, the presence
of hair or facial features, and other similar factors are not
very critical.  However, if sounds are to be localized
in elevation as well as in azimuth, there must be a reflecting
surface below the head, e.g. a torso.

     It has been mentioned that the ear has a different frequency
response for sounds arriving from different angles.  In fact, the
structure of the pinna (outer ear) is such that an impinging sound wave
undergoes multiple reflections in the pinna before reaching the eardrum.
The amplitudes and relative time delays associated with these multiple
reflections are, of course, angle dependent.
An excellent paper on this topic was published by Wayne
Batteau, 1965, in Proceedings of the Royal Society,
London.

     I have hundreds of references in all these areas: blind
echo location, binaural hearing and recording with dummy heads,
and sound transformations in the outer ear.
If there is interest, I will compile a bibliography
as I have time, and will send it to anyone who wants it.

Doug Martin   martin@nosc.mil
Naval Ocean Systems Center,
San Diego, ca 92152.
phone: (619) 553-3659.

bloch@mandrill.ucsd.edu (Steve Bloch) (11/17/89)

gvokalek@augean.OZ (George Vokalek) writes:
>...Moving the head by several cm therefore represents a
>significant fraction of one wavelength, resulting in a significantly
>different sound pattern in the ear.
>
>Note that if this means that you should be able to localise high
>frequency sound more accurately than low frequency sound.  Personally,
>this seems reasonable - for instance its easy to find a mosquito.
>I cant think of any Low freq examples.

Well, this makes sense as long as the wavelengths don't get shorter
than the diameter of the head; after that everything goes to hell.

But in general, that's pretty good, sonny, but that ain't the way I
heerd it.  Way I heerd it, you can more easily localize high BAND-
WIDTH sound than low BANDWIDTH sound.  A mosquito is an example of
this too, of course.  A good example (since we're talking wildlife)
is from ornithology: many common songbirds use a high but nearly pure
whistle when a predator shows up and they want to warn one another
without being located, but they use a wide-band "chuck" when the
intruder is one they think they can scare away; this way they can
find one another easily and gang up on it.

"Writers are a funny breed -- I should know." -- Jane Siberry

bloch%cs@ucsd.edu

bloch@mandrill.ucsd.edu (Steve Bloch) (11/17/89)

brianw@microsoft.UUCP (Brian Willoughby) writes:
>I wonder how much processing your brain does to reverse compute the
>changes in the sound as it passes through our irregularly shaped outer
>ear.  i.e. is it possible that a sound from behind is distinguished
>by the path it takes around your outer ear?

I seem to remember a discussion in Runstein & Huber _Modern_Recording_
_Techniques_ that described fairly precisely the delays stemming from
reflection from outer and inner pinnae of the ear, and in particular
that reflections from one set of pinnae gave predominantly front/back
information, the other predominantly up/down information.  The book's
at home, so I don't have the figures here.

"Writers are a funny breed -- I should know." -- Jane Siberry

bloch%cs@ucsd.edu