[sci.virtual-worlds] Virtual Sound Environments: Sensory Modalities and VR

lehnert@aea.e-technik.uni-bochum.de (Hilmar Lehnert) (04/25/91)
Hi everybody,

One of  the sensory  modalities, which  I believe  to be vital for
many VR  applications is our sense of hearing. This aspect has not
been discussed  very much in this newsgroup and so I would like to
add some philosophy on the auditory part of VR.

Summary:
(For those, who do not like to read long articles.)
The treating  of sound  is as  complex as  the treating  of light.
Human  perception   of  auditory   environment  is  very  complex,
especially because  all spatial  information is  squeezed  through
only two  input channels.  The auditory part of VR is not close to
the solution, we've just scratched the surface.


What's the performance of our ears ? (mainly literature stuff)

Our auditory system offer a dynamic range of 120 dB (even more for
boom-car drivers)  over a  frequency range  of about  10  octaves.
Sounds can be perceived, even if the signal-to-noise-ratio is down
to -85  dB. The  spatial coverage  is 360x180 degrees, the spatial
resolution better  then one  degree (azimuth).  The sensitivity is
close to  the physical  limit, a  few dB's more and you would hear
the thermical noise of the air particles. It can't be switched off
(that may have saved your live a couple of times) and the auditory
sense  is   probably  the  most  important  sensory  modality  for
communication.
Our auditory  system has only to input channels and the human head
can be  viewed as  a stereo  microphone with  special  directional
characteristics caused  by the influence of pinna, head, shoulders
and  torso.   These  characteristics   are  called  head  transfer
functions (HTF).  During the  process of  perception, the  HTF are
used to  determine the  direction of  incidence and  to a  certain
extent also the distance of a sound source.
By the  HTF the spectrum of a sound source is modified drastically
according to the direction of incidence. We've measured variations
at one ear of up to 60 dB. Surprisingly enough, we do not percieve
a changing  of the timbre for different directions. It seems, that
we are  not only  able to  recognize directions,  but also to do a
kind of inverse filtering of HTF's when perceiving timbre.

How do we perceive environment with our ears ?

In any natural environment, the sound emitted by a sound source is
reflected by  the surrounding  surfaces and  thus an  image of the
environment is  created in  our perceptual  space. Changes in room
size or shape, in the acoustic properties of absorbing, reflecting
or scattering  surfaces or  in the  directivity of  a sound source
cause clearly  audible (and  sometimes dramatical)  changes in our
perception. We  do hear significant differences, when listening to
the same source in the same direction but in different rooms, even
if the  rooms have  the same  reverberation times.  The process of
human perception  of auditory  environment is  not well understood
yet.  Probably,   the  human   auditory  system  analyzes  certain
reflection patterns,  temporal and  spatial reflection  densities,
long  and   short-term  interaural   correlations,  energy   decay
processes and a lot more.
The propagation  of sound in an environment is as complex as it is
for light,  or maybe  even more complex, because the wavelength of
the sound  varies from  2 cm  to 20 m, and so most of the surfaces
neither reflect  geometrically nor  diffusely,  but  something  in
between.

What's about the current real-time 3D audio displays ?

Systems like  the convolvotron  (Crystal River),  focal point  (Bo
Gehring) or  the binaural  mixing console  (Head Acoustics) create
localization cues  by filtering  the signal  with the  HTF for the
desired direction  of incidence,  specified by the azimuth and the
elevation angle.  That allows  to place  the auditory event on the
surface of  a (more or less corrupted) sphere around the listeners
head. Well,  I personally  have some difficulties with the term 3D
in this  case. Azimuth plus elevation makes two, what about number
three,  which  is,  off  course,  distance  in  this  case.  Human
perception of distance is maybe more complex then just judging the
loudness of  a  signal,  especially  if  a  sound  source  has  no
'natural' loudness.  I  believe  that  without  adding  some  more
sophisticated distance  cues, no real 3-D image can be achieved. I
would rather  call these  systems 2D  (a surface), as I would call
mono 0D (a point) and intensity stereo 1D (a line).
Another point  is, that  no environmental  information is  encoded
into the  audio signals  (with the  exception maybe  of AKG's  CAP
machine). Only  the direct sound is simulated. That corresponds to
an anechoic  chamber, which  is maybe not a very natural situation
or suitable to give you a feeling of 'being there'.
I think  that these systems do a great job, regarding the hardware
capabilities and  the knowledge  presently available.  They surely
are a big step in the right direction.

How could the auditory part of a VR-system look like ?

When designing  VR systems,  one should  keep in  mind,  that  the
objects the  virtual world  is made  of, have different properties
for different  sensory modalities.  The properties relevant to the
auditory sense  are the  geometric extensions  and the  acoustical
properties (e.g. reflectance, wall impedance, degree of diffusion,
transmittance etc.).  Each  sound  source  should  be  assigned  a
directivity. As  well as  a rendering  process is  needed for  the
visualization an  "acoustical rendering"  has to  be performed for
the auralization.
Current research  projects (I  know of  labs in  Sweden,  Denmark,
France and  Germany) deal  with so-called binaural room simulation
systems.  These   system  model   the  sound   field  using   some
approximations and  perform the  auralization of the results using
binaural technology.  The  results  sound  more  like  real  world
signals then  does the  pure direct  sound  simulations.  However,
these systems  are far  from operating  in real-time, because they
deal with several thousand reflections.

Conclusion :

Some of  the articles recently posted may raise the idea, that the
auditory part  of VR is no big deal any more, and just the problem
of individual  corrections remains to be solved. This is not true.
To create natural sounding or even authentic auditory environments
in real  time is  a very  difficult approach and a lot of research
work (mainly  psychoacoustical stuff  and work  on the  effects of
combining different  sensory modalities)  is still  required.  The
interaction of  sound with  any environment is as complex as it is
for  light  and  humans  are  able  to  perceive  a  lot  of  that
interaction very well. The auditory sense can support the creation
of the  feeling of  "being there" in VR very much, because it does
the same  job in  the real  world. A  simple way  to check for the
quality of  a simulation  is to  simulate the room you're in, play
the results  to any  listener and wait for the reaction. If he/she
turns in  the right direction and says "oops, who's talking there"
your'e on the right path.

Hilmar Lehnert (lehnert@aea.e-technik.uni-bochum.de)

Re : Where are the woman (minorities in that group)
I must admit I'm male and white, but at least I'm european.

--