[comp.dsp] Adjust-Speed CD player?

dnwiebe@cis.ohio-state.edu (Dan N Wiebe) (09/22/89)

	I know nothing about DSP other than what I've figured out
on my own, based on simple common sense, but it seems to me that
there's a better way to lower pitch than just doubling samples.
Seems to me that you should interpolate them; if you have three
samples, 1000, 500, 250, and you want to make five samples out of
them, it seems to me that instead of doing 1000, 1000, 500, 500, 250
you should do 1000, 750, 500, 375, 250.  That's a reasonably simple
add-and-shift-right-one-bit algorithm that shouldn't take too long
and would preserve more of the fidelity than sample doubling.
	Also, if you're going to remove samples, I think you
shouldn't use a simple kill-every-nth-sample procedure.  It seems
to me that certain samples (local maxima and minima) are more
important than others.  Use a five- or seven-sample queue, where
you consider the middle one for removal.  If it's a local minimum
or maximum, zap one of the ones next to it instead.  That's a bit
more complicated, but I think a 10MHz 8086 could probably keep up
with a 30Khz 16-bit sample stream.
	Of course, this just addresses the problem of compression
or expansion of the waveform, not constant-pitch/variable-speed or
vice versa.  If we could do that in real time, to certain components
of a sound and not others (for example, vary the pitch of spoken
vowels but not consonants, or the pitch of a violin string but not
the pitch of the bow "scrape"), we could probably ditch acoustic
instruments altogether and replace them with samples.
	Again, while I'm very interested in this field, I am by no
stretch of the imagination anything remotely resembling an
authority, so keep your flames gentle :-).

mhorne@ka7axd.WV.TEK.COM (Michael T. Horne) (09/23/89)

> 	I know nothing about DSP other than what I've figured out
> on my own, based on simple common sense, but it seems to me that
> there's a better way to lower pitch than just doubling samples.
> Seems to me that you should interpolate them; if you have three
> samples, 1000, 500, 250, and you want to make five samples out of
> them, it seems to me that instead of doing 1000, 1000, 500, 500, 250
> you should do 1000, 750, 500, 375, 250.  That's a reasonably simple
> add-and-shift-right-one-bit algorithm that shouldn't take too long
> and would preserve more of the fidelity than sample doubling.

This is certainly one of the better methods for `slowing down' a sampled
source, however, I suggest interpolating between the samples by simply
convolving the data stream with a sinc function.  Once could easily setup
an FIR filter with variable coefficients.  The main advantages are 1) you can
interpolate at virtually any sample increment you want (i.e. you can do more
than just generate intermediate samples), 2) by the nature of the operation,
the result will be phase linear (i.e. no phase distortion), and 3) you can
achieve greater interpolation accuracy, depending on the length of the filter.

There are disadvantages to this system, the most important being whether or
not you calculate the interpolation filter coefficients on the fly.  The other
alternative is to pre-calculate the coefficients, storing them in ROM, for
example.  All of this depends on what range of `slowing down' you want to
support.  Another problem is that you are generating more samples than what the
source is generating, so you'll have to throttle the input if you wish
to output
the results of the interpolation at the same rate as the input.  As you
can see,
all of this is application dependent.

> 	Also, if you're going to remove samples, I think you
> shouldn't use a simple kill-every-nth-sample procedure...

If you want to throw away samples, you *really* need to filter the data before
doing so, otherwise you will see (hear) aliasing of the data, depending upon
the spectra of the input and how often you are throwing away samples.  When
you decimate any sampled data set, you must low-pass filter the data at half
the new sample rate (Nyquist rule) unless you are sure that the data has
no spectral components above half the new sample rate.

All this said, I don't think this is the optimal method for tone shifting,
however it might work for `fast/slow-forward' effects.  If you wish to shift
the tones while retaining the same sample rate, I would suggest some sort of
frequency scaling algorithm, perhaps by doing a digital mix with a reference
(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency),
followed by a carrier and lower sideband suppression (Hilbert transform filters
are very easy to implement digitally).  At a fast glance, I think this might
work well for moving the spectra of an audio source up/down some arbitrary
frequency, and should be doable with some of the common DSP chips currently
available.

Mike Horne
Visual Systems Group
Tektronix, Inc.
mhorne@ka7axd.wv.tek.com

wass@Apple.COM (Steve Wasserman) (09/23/89)

In article <4653@orca.WV.TEK.COM> mhorne%ka7axd.wv.tek.com@relay.cs.net writes:
>> 	I know nothing about DSP other than what I've figured out
	... much stuff deleted ...
>source, however, I suggest interpolating between the samples by simply
>convolving the data stream with a sinc function.
>

There are two separate problems to be solved here.

First: if you spin the CD faster than usual, what do you do with the
extra samples?  For example: if a CD is sped up such that 52.9
Ksamples/second are read (which represents an increase in speed of 6/5
or 20%), 8.8 "extra" Ksamples accumulate every second.  I am assuming,
of course, that the sound will be reconstructed by circuitry which
operates at a constant 44.1 KHz (or some oversampling multiple
thereof, I suppose).  The reason I make this assumption is because it
would be difficult to construct a variable analog reconstruction
filter that would be able to handle a large range of possible sampling
speeds (say plus or minus five times the original sampling frequency).
This problem is called "sample rate conversion" or something similar
in textbooks.

I don't think that it can be said that any one sample is "more
important" than any other sample because it is a local minimum or
maximum.  In fact, the method of not dropping these sample as
suggested would introduce random noise into the signal.  

In general, it is easy to convert between two sampling rates that are
rational multiples of each other (hence, I chose 6/5 in my example).
The first step is to interpolate the signal by the numerator ...
convert it to a sampling rate six times the original in my example.
This is simply done by adding five zeros between every sample and then
applying a digital filter.  Zero padding has the effect of replicating
the original spectrum a number of times.  A filter is used to remove
the unwanted copies of the original spectrum.  Sorry I can't think of
a good way to draw spectra using text only, but diagrams would be
helpful here.

The next step is to filter out all (or most) spectral energy which
would be "aliased" when throwing away the unneeded samples.  This
involves applying another filter to quiet the components above the
Nyquist rate of the signal after the extras are thrown out.  After
this has been done, four of every five samples can be safely thrown
out without distorting the signal.  (always the same four out of the five).

In practice, the two filters can be combined so the procedure is:
zero-pad, filter, and the throw away the unneeded samples.  People
have found more clever ways of doing this in some circumstances, but
in theory, this way is as good as any.  Obviously, if you want
to change the sampling rate by 7724/137, you have a problem.

>All this said, I don't think this is the optimal method for tone shifting,
>however it might work for `fast/slow-forward' effects.  If you wish to shift
>the tones while retaining the same sample rate, I would suggest some sort of
>frequency scaling algorithm, perhaps by doing a digital mix with a reference
>(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency),

The second problem is: once you've thrown away the right number of
samples, how do you make the pitch sound right?  Note that a mere
frequency translation by digitally mixing in a reference frequency is
not exactly what's required to make everything right again.  Spinning
the CD faster EXPANDS the spectrum of the original sound in frequency
-- it doesn't just shift it.  (unless, of course, you are looking on a
log scale :-) To prove this to yourself, imagine a recording of two
notes: concert A (440 Hz) and one octave above it (880 Hz).  When we
increase the CD speed by 20 %, these two frequencies are changed to
528 and 1056 Hz.  Assume that we've thrown away the proper number of
samples from the original recording.  Now, if we mix the resultant
signal with a 88 Hz signal (528 - 440 = 88) and do the proper
filtering, we'll get 440 Hz and 968 Hz ... oops, they don't sound like
octaves any more.

Theoretically, what needs to be done is to compress the spectrum of
the speeded-up sound down to its original size.  This can be done by
applying the previously discussed interpolation/decimation method to
the FFT samples of the signal (I think) and then inverse transforming
and playing the signal out at the original sampling rate.  I'm sure
that somebody has come up with a computationally superior method to
the one I have suggested.

(note: invert this discussion if you want to talk about slowing a
recording down.)

>> 	Also, if you're going to remove samples, I think you
>> shouldn't use a simple kill-every-nth-sample procedure...
>
>If you want to throw away samples, you *really* need to filter the data before
>doing so, otherwise you will see (hear) aliasing of the data, depending upon
>the spectra of the input and how often you are throwing away samples.  When
>you decimate any sampled data set, you must low-pass filter the data at half
>the new sample rate (Nyquist rule) unless you are sure that the data has
>no spectral components above half the new sample rate.
>
>followed by a carrier and lower sideband suppression (Hilbert transform filters
>are very easy to implement digitally).  At a fast glance, I think this might
>work well for moving the spectra of an audio source up/down some arbitrary
>frequency, and should be doable with some of the common DSP chips currently
>available.
>
>Mike Horne
>Visual Systems Group
>Tektronix, Inc.
>mhorne@ka7axd.wv.tek.com

-- 
swass@apple.com

brianw@microsoft.UUCP (Brian Willoughby) (09/23/89)

In article <61860@tut.cis.ohio-state.edu> dnwiebe@cis.ohio-state.edu (Dan N Wiebe) writes:
>
>there's a better way to lower pitch than just doubling samples.
>Seems to me that you should interpolate them; [...]
>[...]  That's a reasonably simple
>add-and-shift-right-one-bit algorithm that shouldn't take too long
>and would preserve more of the fidelity than sample doubling.

That would work fine for a fixed shift downward of exactly 1 octave.
A good analog filter set at the proper frequency (1/4 sampling rate)
could also do the "interpolation".

>	Also, if you're going to remove samples, I think you
>shouldn't use a simple kill-every-nth-sample procedure.  It seems
>to me that certain samples (local maxima and minima) are more
>important than others.  Use a five- or seven-sample queue, where
>you consider the middle one for removal.  If it's a local minimum
>or maximum, zap one of the ones next to it instead.  That's a bit
>more complicated, but I think a 10MHz 8086 could probably keep up
>with a 30Khz 16-bit sample stream.

You would be surprised how slow the 8086 is when repeating an mediumly
complex operation thirty-thousand times a second.  A 10MHz 8086 has an
instruction rate of only slightly greater than 500 kIPS (that's a rough
estimate, folks, but I'm sure that its below 1 MIPS), and with 30ksamples
per second, you would have to achieve your algorithm with only 16
instructions.  The 8086 would have a hard time just examining that much
data, much less altering it.

What you've described, at least the end result of maintaining local
maxima and minima, could be done using curve fitting techniques (which
I know very little about), but would certainly require a faster
processor.

>	Again, while I'm very interested in this field, I am by no
>stretch of the imagination anything remotely resembling an
>authority, so keep your flames gentle :-).

No flame intended, in fact I'm open to suggestions and ideas no matter
where they come from.

Brian Willoughby
UUCP:           ...!{tikal, sun, uunet, elwood}!microsoft!brianw
InterNet:       microsoft!brianw@uunet.UU.NET
  or:           microsoft!brianw@Sun.COM
Bitnet          brianw@microsoft.UUCP

d88-jwa@nada.kth.se (Jon W{tte) (09/24/89)

In article <4653@orca.WV.TEK.COM> mhorne%ka7axd.wv.tek.com@relay.cs.net writes:

>frequency scaling algorithm, perhaps by doing a digital mix with a reference
>(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency),
>followed by a carrier and lower sideband suppression (Hilbert transform filter
>are very easy to implement digitally).  At a fast glance, I think this might
>work well for moving the spectra of an audio source up/down some arbitrary
>frequency, and should be doable with some of the common DSP chips currently
>available.

As I've said before: that's not scaling, that's OFFSET ! You can't do that
to MUSIC, because music has a realative overtone spectra. Consider:

440 Hz + 880 Hz make a (very simple) harmonic note.

Shift 100 Hz:

540 + 940 Hz makes two sine notes !!! And imagine the effect this has on
complex waveforms like a violin or a piano ... SHUDDER !

h+@nada.kth.se
-- 
The only way to get rid of temptation is to yield to it.

mhorne@ka7axd.wv.tek.com (Michael T. Horne) (09/25/89)

In a recent article by d88-jwa@nada.kth.se (Jon W{tte):
>>frequency scaling algorithm, perhaps by doing a digital mix with a reference
>>(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency),
>>followed by a carrier and lower sideband suppression...
>
>As I've said before: that's not scaling, that's OFFSET ! You can't do that
>to MUSIC, because music has a realative overtone spectra.

Ah.  Having arrived at this discussion in mid stream (without the benefit of
context), I believe that the question is about scaling an arbitrary `musical'
(from the viewpoint of the average person) sequence up or down in frequency
while retaining the `musical-ness' of it. :)  I still stand by my suggestion
of shifting an arbitrary spectrum up/down in frequency by doing a digital
mix, however, I agree that it will not generate the desired effect for use
in scaling music.  It would appear that the `vocoder' method may provide
arbitrary control (e.g. true (multiplicative) scaling) for generating such
desired effects.

>And imagine the effect this has on complex waveforms like a violin or a
>piano...

Could you please elaborate on the typical spectra of a given note on a
piano or violin?

Mike

Michael T. Horne                                      VSG/ITD, Tektronix, Inc.
mhorne@ka7axd.wv.tek.com                                        (503) 685-2077

gda@creare.creare.UUCP (Gray Abbott) (09/26/89)

The most common term in the literature for changing the speed of
a signal without changing the pitch is "time scaling".  Some researcher's names
I vaguely remember are Jones (at Rice U.) and Quatieri (at MIT Lincoln Labs).
Follow them to other references.  Check IEEE ASSP journals.

Jones had a TMS32010 algorithm which could process speech in real-time.
Quatieri uses (I think) a cosine transform method, which I "hear" is
very good, even on music, but I haven't "heard" it.

The FFT approach would be to use a Short-time Fourier transform (see
Rabiner and Schafer's _Digital Processing of Speech Signals_), with
modifications to the reconstruction algorithm to allow for compressed/expanded
time scales.  Any modifications to the STFT usually degrade the signal.
It's probably a little beyond current DSP chips (probably not beyond custom
military systems), but not too far out of reach.

The inverse problem of pitch shifting was correctly addressed by another
poster: you interpolate and decimate to get a rational shift.  Note that
musicians can be very sensitive to small errors in pitch.  This is why
some digital synthesizers use variable rate sampling instead of digital
pitch shifting.  To achieve high quality shifts can require several megabytes
of buffer space for the interpolate/decimate operations.  Fortunately,
it doesn't require a whole lot of computation, if done right.

It's been a while since I've thought about any of this, and I don't have
my notes at hand, so don't take it all as gospel.


						Gray Abbott
						...dartvax!creare!gda

elliott@optilink.UUCP (Paul Elliott x225) (09/26/89)

In article <4653@orca.WV.TEK.COM>, mhorne@ka7axd.WV.TEK.COM (Michael T. Horne) writes:
> <deleted>
> All this said, I don't think this is the optimal method for tone shifting,
> however it might work for `fast/slow-forward' effects.  If you wish to shift
> the tones while retaining the same sample rate, I would suggest some sort of
> frequency scaling algorithm, perhaps by doing a digital mix with a reference
> (digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency),
> followed by a carrier and lower sideband suppression (Hilbert transform filters
> are very easy to implement digitally).  At a fast glance, I think this might
> work well for moving the spectra of an audio source up/down some arbitrary
> frequency, and should be doable with some of the common DSP chips currently
> available.

Pitch shifting like this doesn't work (well), since it alters all the harmonic 
relationships.  For example: a 1000 Hz / 2000 Hz octave, after shifting 100 Hz
becomes a 1100 Hz / 2100 Hz pair -- definitely not an octave.  If you have the
opportunity, listen to single-sideband voice (ham radio is a good place).  You
can only shift the pitch a small amount (by mis-tuning the receiver) before
virtually destroying the intellegibility of the voice, where with true pitch
shifting (all frequencies multiplied by n), voice remains reasonably 
understandable over several octaves.

Of course if you think shifting-by-adding messes up voice, you should hear what
it does to music!  Definitely not Hi-Fi -- more like atonal bagpipes :-)

... Paul

-- 
Paul M. Elliott      Optilink Corporation     (707) 795-9444
         {pyramid,pixar,tekbspa}!optilink!elliott
"I used to think I was indecisive, but now I'm not so sure."

ingoldsb@ctycal.COM (Terry Ingoldsby) (09/27/89)

In article <4671@orca.WV.TEK.COM>, mhorne@ka7axd.wv.tek.com (Michael T. Horne) writes:
> In a recent article by d88-jwa@nada.kth.se (Jon W{tte):
> >>frequency scaling algorithm, perhaps by doing a digital mix with a reference
> >>(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency),
> >>followed by a carrier and lower sideband suppression...
> >
> >As I've said before: that's not scaling, that's OFFSET ! You can't do that
> >to MUSIC, because music has a realative overtone spectra.
> 
> context), I believe that the question is about scaling an arbitrary `musical'
> (from the viewpoint of the average person) sequence up or down in frequency
> while retaining the `musical-ness' of it. :)  I still stand by my suggestion
...
> >And imagine the effect this has on complex waveforms like a violin or a
...
> Could you please elaborate on the typical spectra of a given note on a
> piano or violin?

Actually, it doesn't matter what instrument (including the human voice) you
pick.  They all contain harmonics and overtones which (to sound right) must
be multiples of the primary frequency.  In addition, instruments like the
piano have chords.  These notes have (generally) frequency relationships
that must be preserved.  An offset will *not* do this.  (It might be
interesting to hear the result).  Basically the shift must vary with
frequency to be accurate.  This still might be doable in frequency space.
A simple shift of the frequency function would produce an offset, therefore
shifting the higher frequencies more than the lower might do the trick.

-- 
  Terry Ingoldsby                       ctycal!ingoldsb@calgary.UUCP
  Land Information Systems                           or
  The City of Calgary         ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb

mhorne@ka7axd.WV.TEK.COM (Michael T. Horne) (09/28/89)

> In general, it is easy to convert between two sampling rates that are
> rational multiples of each other (hence, I chose 6/5 in my example).
> The first step is to interpolate the signal by the numerator ...
> 
> The next step is to filter out all (or most) spectral energy which
> would be "aliased" when throwing away the unneeded samples...
>
> Obviously, if you want to change the sampling rate by 7724/137, you have
> a problem.

The main merits of the system I discussed earlier is that you *can* interpolate
by any rational increment, subject to the accuracy of the word size you use
to represent the increment.  Resampling the input by a 7724/137 ~= 56.38X
sampling rate increase is easy.  You aren't limited to small, rational
increments
in sampling rate since you are are directly calculating the actual interpolated
sample values from the original samples.  By evaluating the sin(x)/x function
at the correct locations (which need not be rational integer locations) you
can interpolate *any* point between elements in the data set.  The only
hard part in using this type of interpolator is how you calculate the sinc
coefficients.  If you have a non-repetitive interpolation increment (i.e.
if you need to recalculate the sinc coefficients for each new interpolated
data point), the bottleneck in such a system resides in the sinc
calculation.  However, there are methods for minimizing this hit, mostly by
trading off accurracy for computation time since your original data set
accuracy is dependent upon how many bits of resolution you have, how much
noise is on the data, the length of the FIR filter you use to calculate the
interpolated value, etc.

This is very similar to how an analog reconstruction filter `interpolates' the
functional values between the known data points output by a DAC.  A
perfect (low pass) reconstruction filter has a sin(x)/x impluse response,
and its response is convolved with the known data points to generate all
`points' between them.  Less than perfect (read: real) reconstruction filters
usually have impulse responses that have the general shape of a sinc function
(though not exactly), but are usually sufficient for correctly reconstructing
the output waveforms.

I have implemented several interpolators using this scheme, and they work
very well and can be easily implemented on a DSP chip set.

By the way, for those of you interested in multirate DSP, I recommend obtaining
a copy of "Multirate Digital Signal Processing," by Crochiere and Rabiner
(Prentice Hall).  This text provides an excellent background on the topic that
Steve discussed in his earlier article.  Also, a chapter in "Advanced Topics
in Digital Signal Processing," by Lim and Oppenheim (also Prentice Hall)
provides a good introduction to the same topic.

> -- 
> swass@apple.com

Mike
mhorne@ka7axd.wv.tek.com