[rec.audio.high-end] Oversampling for the curious, the furious, and the damned

max@uwm.UUCP (Max Hauser) (06/24/91)

You may wish to save this article either for interested friends or for
future reposting as necessary since my readership of these newsgroups is
irregular.


Introduction:

This is a broad and not-very-technical online summary of CD oversampling,
antidotal to the lies and nonsense served up in consumer-audio retailing
and an alternative to the well-intentioned but misleading or fractional
explanations often seen in the popular audio press (including the so-called
serious or high-end publications).  This synopsis is less detailed but much
broader than my earlier "technical summary," written for engineers and posted
to rec.audio in 1987 and occasionally since.

If you want more depth of information on these topics, my colleague Prasanna
Shah recently published a dense technical overview of audio oversampling in
the popular magazine Audio in January.  I published a long technical tutorial
(not specific to CD or DAT products) in the Journal of the Audio Engineering
Society, January/February 1991 (vol. 39 no. 1/2 pp. 3-26).  A lighter and
shorter overview of mine is also available as Preprint #2973 from the Audio
Engineering Society, 60 East 42nd Street, New York, New York 10165 USA,
Telephone 212-661-8528, FAX 212-682-0477.  This preprint was recently
recommended by the popular magazine Stereophile.  AES charges $5 for such
preprints ($4 for members), $3 for journal-article copies and $10 for back
issues, and is pretty efficient about getting them into your hands if you
FAX them a request with V. or MC. credit-card information (I did so recently
and the copies arrived in the mail in about three days).  These papers
contain many further references.  Some of these sources emphasize the A/D
rather than D/A path but the core principles are identical (the circuit
implementation of each section interchanges between analog and digital).

Do not be alarmed if the following summary takes an approach different from
what you read elsewhere.  There are details here not usually mentioned in
popular summaries (or even in the research literature).


A. Thumbnail Sketch of Oversampling

Signals such as audio, stored digitally, entail a finite *sampling rate*
(44.1 kilosamples/sec for the 12-cm CD) whereas in their natural (analog)
form they are continuous-time waveforms (you can think of this usefully as
an "infinite" sampling rate).  The circuitry that regenerates the
continuous-time analog output in a CD player has two major tasks:  
translating a stream of digital numbers into analog values ("conversion")
and also bridging between the finite sampling rate of the digital sequence
and the "infinite" sampling rate of the outside world (that is, restoring
a correct continuous waveform from discrete samples) -- "reconstruction."

Non-oversampling conversion-reconstruction (C-R) systems make the transition
from finite to "infinite" sampling rate in one step, while oversampling
systems do it through one or more intermediate sampling rates (higher than
the original, but still finite).  Although the details may not be obvious at
this point, producing these intermediate signals with elevated sampling
rates is a purely digital process and can thus be performed predictably and 
repeatably (although it does require that you have technologies where
digital logic is very cheap, and therefore it was unattractive until recent
years, although the basic techniques have been known since the 1950s and in
embryonic forms since the time of the second world war).

Not only the reconstruction process but also separately the conversion
process (bits into volts) benefits from the use of an intermediate sampling
rate on the way to continuous time.  Designers can orchestrate eloquent
mathematical tricks to trade a higher deliberate sampling rate for lower
required resolution in internal digital-to-analog converter (DAC) circuitry.
This in turn tends to render the analog part of the C-R chain simpler and
more tolerant of component fluctuations.  But moreover, in practice
oversampling C-R systems blend the two tasks of conversion and
reconstruction so that they overlap in actual hardware, unlike a classical,
non-oversampling system.  The subjects of this paragraph are extremely
complex and seductively counterintuitive even to well-trained engineers,
and they habitually garner the most imaginative misinterpretations in
popular-press writing.

An oversampling conversion-reconstruction (C-R) system in practice normally
contains a series of four major blocks.  The first is a sampling-rate-
increasing digital filter, the second a digital quantization-management
subsystem or "noise-shaping modulator," the third a DAC circuit _per se_
and the fourth an analog lowpass filter.  A classical, or non-oversampling,
system lacks the first two blocks, but is far more demanding of the last
two blocks, which are analog circuits that largely determine the
performance and subtler behaviors of the signal path.  (That's the whole
reason for oversampling, in a nutshell.)

By the way, these four blocks reflect a combination of traditionally
separate specialties in electrical engineering, each with a different
intuition and set of assumptions about what is technically difficult or
important.  This is why you will find many different explanations of
oversampling (some of them seemingly in conflict) even from competent
specialists.  The first of the four blocks is generically a digital
filter, the second a quantizer (or quantized feedback system), the third
a precision analog circuit and the fourth an analog filter, and most or
all are realized in integrated circuitry.  Thus, for example, someone
familiar with digital filtering will usually focus on the first of the four
blocks, and when asked for more information will instinctively steer you
to the general digital-filter literature (which unfortunately is extremely
dilute on this subject).  In reality an oversampling C-R chain entails
intimate concert between all four blocks and between multiple technical
specialties -- none alone is sufficient to explain what is going on.


B. Interpolator

The first block, the sampling-rate-increasing digital filter, in an
oversampling C-R system is commonly nicknamed an "interpolator."  This
jargon is triply unfortunate.  First, almost everybody unfamiliar with
multirate digital filtering assumes from the name, incorrectly, that this
block performs "interpolation" in the common mathematical sense (such as
linear or polynomial interpolation between data points).  Actually the name
is a specialized digital-filtering coinage subtly but crucially different.
Second and third, as if that weren't trouble enough, the term
"interpolative" is sometimes applied in two further ways to oversampling
C-R systems (one of these usages is a subset of the other).  More details
about this and other glorious terminological pitfalls are in my recent AES
Journal paper.

Here is the briefest sketch of how the rate-increasing filter works.  The
objective is to convert a signal at a sampling rate like 44.1 ks/s to a
signal at a higher sampling rate *without* changing the information content.
Mathematically this is a well-defined and tractable problem.  If you just
take the original sequence of samples and insert after each of them, for
example, three new samples (with value zero, or holding the last old-sample
value, or almost anything else intelligent) then you will obtain a new
sequence at four times the original rate.  In frequency spectrum this new
sequence will however include new high-frequency replicas (images) of the
original signal's spectrum.  A digital lowpass filter will remove these
images and leave a signal spectrally identical to the original.  In the time
domain, you will now see a higher-rate sequence that will look like the
original but with the "right" new samples smoothly inserted between old.
(In actual practice the "insertion" of new samples is NOT a separate
step as above, but is incorporated into the digital-filter arithmetic.)


C. Multibit vs. single-bit vs. MASH etc.

All four of the major blocks of an oversampling C-R system, outlined in
Section A, admit endless variations, opportunities for design cleverness,
and high/low quality choices, and account for practical performance and
manufacturing-cost differences among CD players.  (All of which players,
incidentally, appear more or less indistinguishable in the rudimentary
and unrevealing standard specifications -- peak SNR, frequency response,
step response -- commonly published.)  A lot of fuss and advertising copy
are however devoted to one particular design difference, the organization
of the noise-shaping modulator (and consequently the format of the
internal DAC circuit(s)).  The common organizations are:

-- Multibit feedback noise shaping.  The modulator properly predistorts
(noise-shapes) the oversampled digital signal sent to a lower-resolution
multibit DAC so that when properly analog-postfiltered its output will
yield the full 16-bit resolution stored on the CD.  This is the oldest
scheme common in consumer products, widely popularized by the NV Philips
SAA 7030 / TDA 1540 chip set (1983) with a 14-bit internal DAC and 4:1
oversampling yielding 16-bit final resolution.  (For more details on how
4:1 relates to two additional bits, see either of my AES papers above.)

-- One-bit feedback noise shaping (called "Bitstream" by Philips and
"delta-sigma" by the research community [Note 1]).  Same as the previous
but taken to an extreme: the internal DAC has only one bit of resolution
and the 16-bit net D-to-A resolution is accomplished by the oversampling,
noise-shaping and postfiltering process.  This approach requires a higher
oversampling factor, such as 128 or 256, other things being equal.

-- Feedforward or multistage noise shaping (abbreviated "MASH" [Note 2] by
Nippon Telephone and Telegraph).  A series of small one-bit feedback noise
shapers, each of which operates on a quantization-error (residue) output
from the previous stage, while simultaneously the quantized outputs are
combined properly to form the main output.  "MASH" data converters are
definitely not "one-bit" data converters in a meaningful sense, as I've
explained in more depth in print, although they are commonly made up of
one-bit subsections and this sometimes causes confusion.

Each of these competing modulator topologies has technical strengths and
weaknesses that are very involved and do not lend themselves to summary.
The signal fidelity in each of them can be excellent but depends on
different sets of circuit elements.  It is all a matter of "second-order"
electrical effects; if the components are all perfect (as they invariably
are assumed to be, in popular explanations of this subject matter), then
all the techniques work equally well.  Very broadly, however, I would say
that the one-bit designs have the fewest subtle distortion vulnerabilities.


D. What does it mean for sound

The electrical specifications of an oversampling C-R system depend on
innumerable component values and design choices and are in no way simply
predictable from whether the internal modulator uses, for example, MASH or
Bitstream or some other topology.  Still less predictable are perceptual
fidelity measures, which of course are the ultimate figures of merit in
audio and other human-interface electronics (a point taken for granted not
only by musicians but also, of course, by competent engineering researchers
and by the graduate communication-theory texts, such as Jayant and Noll).

This does not, of course, prevent glib advertising copywriters and cult
audio pundits from directly linking this or that audible property to the
glamorous _au courant_ labels like MASH and delta-sigma (just as it is
thought very fashionable to talk about Fast Fourier Transforms, no matter
how ignorantly or irrelevantly, in stock-price analysis, or about chaos
theory in management -- I could go on and on, and I do).  Such writers
might even be right, but they almost certainly don't know it -- the audible
differences are much more likely due to the oblique dependence of the C-R
topology on other design choices in the player, or to the quality of
analog-digital ground isolation, or to the choice of output-filter op amps.


Notes from the text:

Note 1:  "Delta-sigma" modulation and data conversion (the inventors' term)
was unintentionally rechristened "sigma-delta" at the Bell Telephone
Laboratories in 1963 and this reversal has propagated through many paper
titles, so you will see both names in use.  No difference is intended.
I have made efforts to redress this reversal and the principals are now in
accord.  My recent JAES paper mentions this and I have further details if
anyone professionally interested sends a mailing address.

Note 2:  Some people dislike the acronym MASH for MultistAge noise SHaping,
though there certainly are endless well-known precedents (UNIted nations
ChildrEn's Fund; GEheime STAatsPOlizei).  When its coiners introduced "MASH"
in the US in 1986 a colleague remarked to me that MUSH was better on acronym
style.  I think however that MUSH would have less audio-marketing cachet.

The technique now dubbed "MASH" by NTT has existed in various forms since
long before its recent popularization by Toshio Hiyashi et al. in February
1986 (this origin itself is usually misattributed to a later paper by
Uchimura et al.).  I have antecedents going back at least to 1969. 
Multibit feedback noise shaping is even older, due to Cutler in 1954.


Max W. Hauser     {mips,philabs,pyramid}!prls!max     prls!max@mips.com

Copyright (c) 1991 by Max W. Hauser.  All rights reserved.

"In graduate school, books are called `Introduction to ...' while in high
school, they are called `All About ...'   This seminar is `All About'
high-speed vertical-amplifier design."     -- Einar Traa, Tektronix, 1978


(PS:  Experience with the Usenet compels the following cautions.  This 
summary is a terse sketch of a complicated and counterintuitive subject.
Abundant clarification and amplification of details is available to anyone
who will take the trouble to obtain and study the references cited at the
beginning, and if necessary take the further trouble to acquire, or else
discuss them with someone having, the technical background on which the
entire subject is built.  If you cannot be bothered to take these steps
then please do not ask me to do them for you.  Also, I regret that I cannot
offer audio-equipment recommendations.)