[comp.music] Timbre Perception and Orchestration

sandell@ils.nwu.edu (Greg Sandell) (06/16/91)

Vance Maverick writes:

> To reverse the decay of comp.music into rec.music.synth.backup, Greg
> Sandell proposes chasing all the synth people away, and then says such
> researchers as he have no time to contribute.  I think the latter is
> the real problem -- the only way to influence the tone of the newsgroup
> is by positive contributions. 

Bravo, Vance.  I'll make a contribution too.  Get ready, though, it's
150 or so lines long...

For the last three years I have been doing research on the perception of 
musical timbre, with an eye towards learning something about orchestration.
I am working on a Ph.D. in Music Theory at Northwestern University.
Orchestration is not considered normal territory for a music theorist,
but I don't think that this is a very good state of affairs for the
field.  The current interests which dominate music theory, still mostly
pitch set theory and Schenker theory, further the belief that the key
to unlocking a piece's meaning is in discovering some tightly organized
network of pitch relations.  As a result works that are masterpieces in
part due to their orchestration yet which exhibit pitch structures resistant
to both pitch set theory or Schenker theory are in effect snubbed by music
theorists.  And this is in stark contrast to the interests of composers,
who are quick to recognize effective exploitation of the timbral 
domain and at times exhibit a language for discussing qualities of
sounds that theorists are not privy to.

So what can a theorist do to bring orchestration into the fold of
music theory?  Remember, theorists are creatures who shun vague,
qualitative descriptions of personal listening experiences (although
lately narrative analyses have come in fashion) and seek to create and
use tools that categorize and quantify the elements which make up
a piece of music.  My choice was to focus on that aspect of orchestration
pertaining to choosing combinations of instruments for concurrent
presentation.  This is one of the great mysteries of orchestration which 
fascinates many musicians, and which is clearly not merely a pragmatic
issue of `instrumentation.'  Next, I chose to restrict the field  of
interest to largely homophonic combinations:  melodies in unison other
semi-fixed intervals, or vertical sonorities.  (This is purely practical;
research has to start somewhere.)  But before can categorizing or
quantifying of this domain can begin, we need to know how such combinations
are evaluated in orchestration practice.

Many orchestration manuals, especially the ones by Rimsky-Korsakov,
Rogers and Piston, spend a great deal of time instructing the student
how to choose timbres that "blend."  With few exceptions, the use of the
term suggests a fusion phenomenon:  blended combinations are those
in which the timbral line of demarcation cannot be distinguished, e.g.,
a cello and bass clarinet which merges into some hybrid, single timbral
quality.  The other end of this spectrum are combinations which clearly
separate into distinct timbres (tuba and piccolo, to name an extreme
example).  Other than the obvious factors that separate timbres (gross
differences in attack time, intonation, etc.), what are the acoustics
underlying the phenomenon of blend?  Although orchestration manuals offer
plenty of prescriptions (do's and don'ts) and examples ("Ravel did it,
so it must be good"), they offer no metric for evaluating any particular
instance.  High speed analysis and visualization of timbre by computer
is here and it's cheap; Rimsky-Korsakov's and Joseph Schillinger's dream
of a scientific basis for orchestration is within reach, or at least,
the ability to investigate its possibility is.  (I might add that Piston
and one or two others deplored the idea of a systematization of
orchestration, but there will always be conservatives.)  

The ideal orchestration manual would offer the student not only isolated cases,
but empower him or her with (to borrow a concept from my current
employer, Roger Schank) case-based reasoning.  Students could
generalize from the acoustical properties which lead to the good blend 
between cello and bass clarinet and apply the principle
to other combinations:  for example, perhaps low-register violin and
English horn blend for analogous reasons.  While blend is certainly not
the only quality one evaluates in concurrent timbres, I decided to
focus on blend and investigate it in a series of perceptual experiments
using musical listeners.

I ran three experiments using sysnthesized musical instrument tones
(same tones as in John Grey's 1975 dissertation).  Tones were presented in
concurrent pairs, and listeners rated how well they "blended".  Poor
blend was defined as the case where individual timbres could be
clearly heard (say, piccolo and tuba) and good blend was defined
as the case where the timbres fused to form a single sonic impression
(say, French horn and trumpet).  A single trial in one of my 
experiments consisted of two instruments playing one short note each, 
in rhythmic unison.  

I explored instruments in unison and in minor thirds, and timbral
modifications such as "artificially bright", "increased inharmonicity",
and unequal intensity levels to observe the effect on "blend."
The main findings have to do with the centroid (level of brightness/
darkness) and the length of the attack portion.  (Examples of dark timbres 
are:  bassoon, French horn; bright timbres are oboe, clarinet in clarion 
register; instruments of moderate brightness include flute and trumpet.)
For unisons, dark tones blend best of all, and the blend steadily worsens as 
a function of the increasing brightness of one or both of the tones in a 
concurrent pair.  This was true in the case of artificially darkened or
brightened tones:  for example, you can improve the blend of oboe
and clarinet by artificially darkening the oboe (like turning up
the "bass knob" on a stereo).  Analogous effects to darkening and
brightening were found with respect to length of attack time:  the
longer the attack time of one or both of the tones, the worse
the blend.  However, the effect for attack time is not as strong as
the effect for centroid.  For non-unison intervals (the minor third),
an additional factor for centroid played a role:  instruments that
were close in centroid (similar degrees of bright/darkness) blended
well.  So, although the presence of low centroids strongly influenced
the blend of the pair, pairs not including an instrument with a low
centroid blended fairly well nonetheless if they were similarly
bright.  Still other factors appeared to involve mechanisms of
auditory stream segregation:  instruments with hightly correlated
amplitude envelopes (also, centroid envelopes, the change in centroid
over the duration of the tone) tended to blend well, while non-correlated
envelopes led to greater separation.

(Note: Unfortunately this medium does not allow me to present all the data
which led me to come to these conclusions; this requires a number of
high-resolution graphs, tables of correlation coefficients, and alot of 
space devoted to explaining exactly how each analysis was done.  But the 
statistical methods employed in coming to these conclusions 
included regression, correlation, t-tests, and Multidimensional 
Scaling, and the measures of statistical significance were 
those used in standard psychological experimentation.  Eventually 
the dissertation will be available through standard channels
(NU's dissertations are carried in University Microfilms) or directly from
me, for those who want to follow up on the details.  The dissertation
will be completed in December 1991.)

John Grey's landmark 1975 dissertation in timbre perception ("An
Exploration of Musical Timbre", Stanford University) proposed a
timbre space of three dimensions, pertaining to (and I am somewhat
generalizing here) centroid, attack and harmonic synchrony.  These
dimensions emerged from an statistical analysis of timbral 
similarity judgments from a group of musical listeners.  The results of 
the present experiments, although involving a different task,
largely corroborate Grey's timbre space.  A "blend space" can
be constructed for each instrument with centroid, attack duration and
envelope correlation as three of the primary dimensions.

The experiments I ran were obviously limited in musical breadth, so
my findings only suggest a beginning for further research in timbre
perception and orchestration.  First of all, one would also want
to investigate blend using melodies rather than isolated note-
pairs, and other intervals should be investigated.  Furthermore,
other ways of evaluating concurrent timbres could be explored.  For
one thing, the relative salience of two timbres in a pair is an
important factor:  loud flute and soft trumpet is very different than
soft flute and loud trumpet.  Part of how we may evaluate those combinations
depends on which of the two qualities dominate the sum perception of
the sound (e.g., flute in the former, trumpet in the latter).  Next,
it is expected that masking among harmonics or masking due to noisy
aspects of instruments (air streams and bow scrapes) play an important
role in evaluating combinations.  One frequent observation in orchestration
manuals is that the flute somehow "softens" the effect of other 
instrumental combinations (for example, Rimsky Korsakov says the flute
can soften the harsh combination of oboes and clarinets; see p. 78 of
his orchestration manual).  The way in which some timbres seem to 
modify others is a point of interest to many musicians, and experiments
to explicate these effects would make an interesting investigation.
Finally, one can investigate the acoustic dissonance of a pair of
timbres based on Plomp and Levelt's measures of "roughness" (in fact,
I haven't eliminated this as a possibility in my dissertation).

Greg Sandell  (sandell@ils.nwu.edu)
Northwestern Computer Music
Northwestern University, Evanston, IL

p.s. I call my research area "Concurrent Timbre."  Eliot ("Music Mediates
Mind[tm]") Handelman, if you're reading this, I could use some advice
on how I can patent this phrase and license it for profit... :-)

-- 
Greg Sandell
sandell@ils.nwu.edu

curt@cynic.wimsey.bc.ca (Curt Sampson) (06/17/91)

Greg, I found your posting on the timbre perception research you've
been doing quite fascinating.

One thing that I am struck by, however, is the rather unquantified
nature of your descriptions of the timbre of various instruments.
For example, you call a claranet "dark" and an oboe "bright."

Not having seen any of the details of your research, it could well
be that you have quantified the timbres much better there than in
your summary.

If not, have you considered doing a Fourier analysis of the various
instruments and looking for correlations in the data from that?  My
ears tell me that as well as some instruments being darker or brighter
than others (that is, the average energy in the harmonics well above
the fundemental being higher or lower), different instruments often have very
different distributions of the energy within those upper harmonics.
Some instruments have a lot of energy concentrated in a few harmonics
(such as the oboe--or so my ears tell me :-)) and some have their
energy spread out more evenly over many harmonics (piano).  I suspect
that these differing distrubtions would make quite a difference in the
blending characteristics (and recognition characteristics, for that
matter).  

Another thing to look at would be the amount and distribution of
non-harmonic energy in an instrument's sound (the scraping of the bow,
and the like).

This might lead to some interesting expriments with computer-generated
tones of varying harmonic structure.  Synthesized tones created with a
decent additive synthesizer would give you far more flexibility when
testing blends of various kinds.  It would also provide a good control
in that one would expect that synthesized waveforms with characteristics
similar to acoustic instruments would generate similar results when
blended for listeners.  That is to say, if you have two waveforms with
a concentrated peak in the upper harmonics and they don't blend, but
two acoustic waveforms with a concentrated peak in the upper harmonics
do blend, there's obviously something else we should be looking for as
an important factor in blending.

So perhaps you could do a few experiments in this area too.  It's only
June, so I'm sure that you'll have plenty of time to research this
whole area and fit that into a brief Appendix in your dissertation.  :-)

cjs
-- 
Curt Sampson            | "This sound system comes to you with fuel injection.
curt@cynic.uucp         |  Toes tapping, the unthinking masses dance to a new
curt@cynic.wimsey.bc.ca |  tune...."		--Gary Clail

maverick@mahogany.Berkeley.EDU (Vance Maverick) (06/18/91)

Sounds interesting.  I'd love to see orchestration get a better rap
in music theory.  I'll bet (for example) that most analyses of the
"Tombeau" from /Pli Selon Pli/ look at its pitch content, even though
(at least for this listener) it's the orchestration that makes it go*
-- particularly since Boulez personally takes the position that
timbre is just icing on the cake of pitch.

Will you have time, in your thesis, to take on the pedagogical
aspects of the teaching of orchestration?  Surely the goal of an
orchestration course is to enable the composer to hear the
combination of instruments mentally; such apparent prescriptions as
R-K's dictum about flutes softening the combination of clarinets and
oboes may serve their real function when, armed with knowledge of a
score, the student listens for this effect, and hears, not
"softness", but the sound of flutes, clarinets and oboes.  Do you
think computer representations of the sounds of instruments are far
enough along that we could write software to help teach composers
this skill of the mental ear?

	Vance

* until it stops going, before the ludicrous entrance of the voice....

sandell@ils.nwu.edu (Greg Sandell) (06/18/91)

In article <1991Jun17.030934.499@cynic.wimsey.bc.ca>, curt@cynic.wimsey.bc.ca (Curt Sampson) writes:
> 
> One thing that I am struck by, however, is the rather unquantified
> nature of your descriptions of the timbre of various instruments.
> For example, you call a claranet "dark" and an oboe "bright."

The posting itself never said anything about the clarinet being "dark,"
but, I get your point.

> 
> Not having seen any of the details of your research, it could well
> be that you have quantified the timbres much better there than in
> your summary.

The scale that I used for brightness and darkness was "centroid."  Centroid
refers to the distribution of spectral energy in a complex sound.  You 
calculate it by weighting each frequency component by its amplitude, 
summing all such values, and dividing it by the sum of the amplitudes
alone.  The division step factors out the amplitude and leaves a single
frequency which identifies the midpoint of spectral energy concentration.

Consider the following 4-harmonic spectra, each with a fundamental
of 100 Hz.  The amplitude scale shown is linear.  The spectra are
identical except for the third harmonic.  In the latter spectrum,
the distribution of spectral energy is shifting slightly higher
in frequency. 

    8                                    8         8
    |                                    |         |
    |                                    |         |
    |    6                               |    6    |
    |    |                               |    |    |
    |    |                               |    |    |
    |    |    4                          |    |    |
    |    |    |                          |    |    |
    |    |    |                          |    |    |
    |    |    |    2                     |    |    |    2
    |    |    |    |                     |    |    |    |
    |    |    |    |                     |    |    |    |
    |    |    |    |                     |    |    |    |
    |____|____|____|___________          |____|____|____|___________
   100  200  300  400                   100  200  300  400

The first spectrum's centroid is calculated as:

(8*100)+(6*200)+(4*300)+(2*400)/8+6+4+2 = 4000/20 = 200 Hz.  The second
spectrum, if you work it out, yields a higher centroid (216.7 Hz.)

This measure has been used with great success in perceptual experiments
of timbre; that is to say, the magnitude of listeners' evaluations of
timbres of different degrees of brightness and darkness are frequently
paralleled (correlated with) by the centroids for those sounds.  Research
showing these results have been reported in Grey(1975), Grey & Gordon (1978)
and Wessel (1978).  I think that the first experiments to show its
perceptual significance were by Lichte(1941) and von Bismarck(1974).
Beauchamp(1982) provides the most explicit published definition of
centroid.  (Citations below.)

> different instruments often have very
> different distributions of the energy within those upper harmonics.
> Some instruments have a lot of energy concentrated in a few harmonics
> (such as the oboe--or so my ears tell me :-)) and some have their
> energy spread out more evenly over many harmonics (piano).  I suspect
> that these differing distrubtions would make quite a difference in the
> blending characteristics (and recognition characteristics, for that
> matter).  

Right you are.  Centroid is a statistical convenience, but obviously
an impoverished representation of timbre.  I have experimented with
other ways of comparing spectra but haven't found any especially
effective ones yet.  One thing I haven't tried, which is suggested to me
by what you say here, is defining some upper frequency region and taking
*its* centroid.  I'll let you know what I learn.  But there is another
representation of spectrum which collapses it down into three values
(rather than one, as in centroid).  This is the "tristimulus method"
by Pollard & Jansson (1982).  They break up the spectrum into:  percentage
of energy in the fundamental, percentage of energy in harmonics 2-5,
and percentage of energy in all harmonics above 5.  I haven't yet found
a great use for this measure, myself.

> 
> Another thing to look at would be the amount and distribution of
> non-harmonic energy in an instrument's sound (the scraping of the bow,
> and the like).

In my study, I account for this by quantifying the amount of precedent
noise at attack time.  This turns out to be a pretty strong cue for
blend.  However, I found that a more general measure, the duration of
the attack time, matched more closely to the judgments.
> 
> This might lead to some interesting expriments with computer-generated
> tones of varying harmonic structure.  Synthesized tones created with a
> decent additive synthesizer would give you far more flexibility when
> testing blends of various kinds.  

The "John Grey tones" that I used *were* additive synthesis descriptions
of the sound, by the way...that's what made it possible for me to analyze
higher-level acoustic properties such as harmonic synchrony, inharmonicity,
etc.

> It would also provide a good control
> in that one would expect that synthesized waveforms with characteristics
> similar to acoustic instruments would generate similar results when
> blended for listeners.  That is to say, if you have two waveforms with
> a concentrated peak in the upper harmonics and they don't blend, but
> two acoustic waveforms with a concentrated peak in the upper harmonics
> do blend, there's obviously something else we should be looking for as
> an important factor in blending.

Well, centroid comes in "first place" in my experiment, but of course
it's not the only acoustic factor.  It would not be hard to magnify
the differences in attack characteristics and envelope similarity
to override what should be a "good blend" from the perspective of
spectrum content. 

> So perhaps you could do a few experiments in this area too.  It's only
> June, so I'm sure that you'll have plenty of time to research this
> whole area and fit that into a brief Appendix in your dissertation.  :-)

If one of my committee members dies on me, you'll be the first person
I call.... :-)

Thanks for your response!

-- 
Greg Sandell
sandell@ils.nwu.edu


Here are the sources I cited:

Beauchamp, J.W. (1982).  Synthesis by spectral amplitude and 'Brightness' 
matching of Manalyzed musical instrument tones. Journal of the Audio 
Engineering Society  30, 396-406.

Grey, J.M., & Gordon, J.W. (1978).  Perceptual effects of spectral 
modifications on musical timbres. Journal of the Acoustical Society of America
63, 1493-1500.

von Bismarck, G. (1974a).  Timbre of steady sounds: a factorial investigation 
of its verbal attributes. Acustica  30, 146.

Lichte, W.H. (1941)  "Attributes of complex tones,"  Journal of Experimental 
Psychology 28, 455-480.

Wessel, D.L. Low dimensional control of musical timbre. Tech. Rept. 12, IRCAM, 
Paris, 1978.

Pollard, H.F. and Jansson, E.V. (1982), "A tristimulus method for the
specification of musical timbre."  Acustica 51, 162-171.

eliot@phoenix.Princeton.EDU (Eliot Handelman) (06/18/91)

In article <2118@anaxagoras.ils.nwu.edu> sandell@ils.nwu.edu (Greg Sandell) writes:
;
;p.s. I call my research area "Concurrent Timbre."  Eliot ("Music Mediates
;Mind[tm]") Handelman, if you're reading this, I could use some advice
;on how I can patent this phrase and license it for profit... :-)

I made it down this far, anyhow. 

There is no distinction between music and theory.

I don't have to LISTEN to a piece of music in order to find out
how it goes. I am conversant with mountains of music sufficiently
"the same" that my thumb can listen: I can read a 12 or 15 minute 
long orchestra piece in about 10 or 15 seconds.

It's more difficult to scan CD's, but it can be done. Speeds 
roughly 20 to 30 times specs, especially for slow computer music,
are completely adequate to the aim of framing a few quick
perceptions. Bear in mind: not the realism of these perceptions,
only their formation, is of interest.

This faculty is less pronounced in some theorists.

mig@cunixb.cc.columbia.edu (Meir) (06/18/91)

* * * * * *  ====================== Meir Green
 * * * * * * ====================== (Internet) mig@cunixb.cc.columbia.edu
* * * * * *  ====================== meir@msb.com  mig@asteroids.cs.columbia.edu
 * * * * * * ====================== (Amateur Radio) N2JPG

sandell@ils.nwu.edu (Greg Sandell) (06/18/91)

In article <1991Jun17.170258.17498@agate.berkeley.edu>, maverick@mahogany.Berkeley.EDU (Vance Maverick) writes:

> -- particularly since Boulez personally takes the position that
> timbre is just icing on the cake of pitch.

Can you think of a particular source where he says this?  

> Will you have time, in your thesis, to take on the pedagogical
> aspects of the teaching of orchestration?  Surely the goal of an
> orchestration course is to enable the composer to hear the
> combination of instruments mentally; such apparent prescriptions as
> R-K's dictum about flutes softening the combination of clarinets and
> oboes may serve their real function when, armed with knowledge of a
> score, the student listens for this effect, and hears, not
> "softness", but the sound of flutes, clarinets and oboes.  

What are the mechanics of the orchestrator's ear, though?  When
you hear bass clarinet and cello, does your mind automatically recognize
it as a learned sound, "bass clarinet and cello", or does it first decompose
the sound into "bass clarinet" and "cello"?  Well maybe for such
frequently used combinations as that one (especially for dramatic effect
in late-19th cent. opera), the first mechanism applies.  But what about
the infinite number of other timbre combinations (different instruments,
dynamics, registers, etc.)?  If I want to learn from someone else's
orchestration (whether I have just the recording or the score as well),
I need to be able to (1) decompose the sound, and (2) hypothesize about
the process behind the sum effect.  That's what the listener does with
the flutes/clarinets/oboes example.

I think alot has been said about visual perception of color mixture
which pertains to the issue of timbre mixture.  I have experienced
firsthand some surprising effects while playing with colors on color
computer monitor.  Say you have text on top of a background, and you
want to find a combination of colors for foreground (the text) and
background.  Suppose I found a foreground color I like and I'm
sampling various backgrounds.  I swear that somehow different background
colors shift the hue, saturation and brilliance of the foreground colors!
Perceptually, of course they are...because of nifty things like Mach
bands and the eye's natural tendency to supply the complementary hue
of each color (i.e. when you look at a bright red light, close your
eyes and see green).  What we need in orchestration is an explanation of
how certain timbres affect others in the perceptual ear.  I think the
two modalities are very analogous in the subject of mixture.

But to answer your question about pedagogy, I am going to provide 
a review of several English-language orchestration manuals of the
current century, but mainly concerning what they say about evaluating
concurrent timbres.  Besides, very few have much to say about how
a student should gradually acquire a good ear for orchestration.  One
of the only exceptions is a curious little article by J. Ott, "A new
approach to orchestration," THE INSTRUMENTALIST 23/9 (April 1969),
pp. 53-55.  He suggests that students embark on a exploration of their
own personal timbre space by vocally imitating all the instruments of
the orchestra and categorizing them according to the vowels they
use to make the sounds.

> think computer representations of the sounds of instruments are far
> enough along that we could write software to help teach composers
> this skill of the mental ear?

It would be great, wouldn't it?  There are certainly alot of sounds
available on compact disk of orchestral instruments (McGill and 
ProSonus), but the number of instrumental sounds you could store in a
sound library will always be miniscule compared to what performers
can do.  But even if there were a limited set of online orchestral
instrument sounds available on a menu-driven system for combining
sounds, think how much you could learn about combinations that you
didn't know before.

> 
> 	Vance
> 
Thanks for helping make this an interesting discussion and to get my
gears turning.

- Greg

Greg Sandell
sandell@ils.nwu.edu

lseltzer@bhupali.esd.sgi.com (Linda Seltzer) (06/18/91)

Vance, you raised an interesting issue.  Too many musical analyses focus
on pitch
issues as if music were a flat document on a page instead of an
interaction of people
playing instruments.  We should be paying more attention to the sense of
ensemble, the
interaction among performers, etc.  Not that I'm against analysis of
pitch content,
but maybe the pendulum has swung too far in that direction.  The same
for pop music.  The notion
of "tracks" has influenced things to the point of virtually obliterating
any sense of dialog
among performers.  Such dialog is clearly present in earlier styles such
as old Boogie Woogie
recordings.