dnwiebe@cis.ohio-state.edu (Dan N Wiebe) (09/22/89)
I know nothing about DSP other than what I've figured out on my own, based on simple common sense, but it seems to me that there's a better way to lower pitch than just doubling samples. Seems to me that you should interpolate them; if you have three samples, 1000, 500, 250, and you want to make five samples out of them, it seems to me that instead of doing 1000, 1000, 500, 500, 250 you should do 1000, 750, 500, 375, 250. That's a reasonably simple add-and-shift-right-one-bit algorithm that shouldn't take too long and would preserve more of the fidelity than sample doubling. Also, if you're going to remove samples, I think you shouldn't use a simple kill-every-nth-sample procedure. It seems to me that certain samples (local maxima and minima) are more important than others. Use a five- or seven-sample queue, where you consider the middle one for removal. If it's a local minimum or maximum, zap one of the ones next to it instead. That's a bit more complicated, but I think a 10MHz 8086 could probably keep up with a 30Khz 16-bit sample stream. Of course, this just addresses the problem of compression or expansion of the waveform, not constant-pitch/variable-speed or vice versa. If we could do that in real time, to certain components of a sound and not others (for example, vary the pitch of spoken vowels but not consonants, or the pitch of a violin string but not the pitch of the bow "scrape"), we could probably ditch acoustic instruments altogether and replace them with samples. Again, while I'm very interested in this field, I am by no stretch of the imagination anything remotely resembling an authority, so keep your flames gentle :-).
mhorne@ka7axd.WV.TEK.COM (Michael T. Horne) (09/23/89)
> I know nothing about DSP other than what I've figured out > on my own, based on simple common sense, but it seems to me that > there's a better way to lower pitch than just doubling samples. > Seems to me that you should interpolate them; if you have three > samples, 1000, 500, 250, and you want to make five samples out of > them, it seems to me that instead of doing 1000, 1000, 500, 500, 250 > you should do 1000, 750, 500, 375, 250. That's a reasonably simple > add-and-shift-right-one-bit algorithm that shouldn't take too long > and would preserve more of the fidelity than sample doubling. This is certainly one of the better methods for `slowing down' a sampled source, however, I suggest interpolating between the samples by simply convolving the data stream with a sinc function. Once could easily setup an FIR filter with variable coefficients. The main advantages are 1) you can interpolate at virtually any sample increment you want (i.e. you can do more than just generate intermediate samples), 2) by the nature of the operation, the result will be phase linear (i.e. no phase distortion), and 3) you can achieve greater interpolation accuracy, depending on the length of the filter. There are disadvantages to this system, the most important being whether or not you calculate the interpolation filter coefficients on the fly. The other alternative is to pre-calculate the coefficients, storing them in ROM, for example. All of this depends on what range of `slowing down' you want to support. Another problem is that you are generating more samples than what the source is generating, so you'll have to throttle the input if you wish to output the results of the interpolation at the same rate as the input. As you can see, all of this is application dependent. > Also, if you're going to remove samples, I think you > shouldn't use a simple kill-every-nth-sample procedure... If you want to throw away samples, you *really* need to filter the data before doing so, otherwise you will see (hear) aliasing of the data, depending upon the spectra of the input and how often you are throwing away samples. When you decimate any sampled data set, you must low-pass filter the data at half the new sample rate (Nyquist rule) unless you are sure that the data has no spectral components above half the new sample rate. All this said, I don't think this is the optimal method for tone shifting, however it might work for `fast/slow-forward' effects. If you wish to shift the tones while retaining the same sample rate, I would suggest some sort of frequency scaling algorithm, perhaps by doing a digital mix with a reference (digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency), followed by a carrier and lower sideband suppression (Hilbert transform filters are very easy to implement digitally). At a fast glance, I think this might work well for moving the spectra of an audio source up/down some arbitrary frequency, and should be doable with some of the common DSP chips currently available. Mike Horne Visual Systems Group Tektronix, Inc. mhorne@ka7axd.wv.tek.com
wass@Apple.COM (Steve Wasserman) (09/23/89)
In article <4653@orca.WV.TEK.COM> mhorne%ka7axd.wv.tek.com@relay.cs.net writes: >> I know nothing about DSP other than what I've figured out ... much stuff deleted ... >source, however, I suggest interpolating between the samples by simply >convolving the data stream with a sinc function. > There are two separate problems to be solved here. First: if you spin the CD faster than usual, what do you do with the extra samples? For example: if a CD is sped up such that 52.9 Ksamples/second are read (which represents an increase in speed of 6/5 or 20%), 8.8 "extra" Ksamples accumulate every second. I am assuming, of course, that the sound will be reconstructed by circuitry which operates at a constant 44.1 KHz (or some oversampling multiple thereof, I suppose). The reason I make this assumption is because it would be difficult to construct a variable analog reconstruction filter that would be able to handle a large range of possible sampling speeds (say plus or minus five times the original sampling frequency). This problem is called "sample rate conversion" or something similar in textbooks. I don't think that it can be said that any one sample is "more important" than any other sample because it is a local minimum or maximum. In fact, the method of not dropping these sample as suggested would introduce random noise into the signal. In general, it is easy to convert between two sampling rates that are rational multiples of each other (hence, I chose 6/5 in my example). The first step is to interpolate the signal by the numerator ... convert it to a sampling rate six times the original in my example. This is simply done by adding five zeros between every sample and then applying a digital filter. Zero padding has the effect of replicating the original spectrum a number of times. A filter is used to remove the unwanted copies of the original spectrum. Sorry I can't think of a good way to draw spectra using text only, but diagrams would be helpful here. The next step is to filter out all (or most) spectral energy which would be "aliased" when throwing away the unneeded samples. This involves applying another filter to quiet the components above the Nyquist rate of the signal after the extras are thrown out. After this has been done, four of every five samples can be safely thrown out without distorting the signal. (always the same four out of the five). In practice, the two filters can be combined so the procedure is: zero-pad, filter, and the throw away the unneeded samples. People have found more clever ways of doing this in some circumstances, but in theory, this way is as good as any. Obviously, if you want to change the sampling rate by 7724/137, you have a problem. >All this said, I don't think this is the optimal method for tone shifting, >however it might work for `fast/slow-forward' effects. If you wish to shift >the tones while retaining the same sample rate, I would suggest some sort of >frequency scaling algorithm, perhaps by doing a digital mix with a reference >(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency), The second problem is: once you've thrown away the right number of samples, how do you make the pitch sound right? Note that a mere frequency translation by digitally mixing in a reference frequency is not exactly what's required to make everything right again. Spinning the CD faster EXPANDS the spectrum of the original sound in frequency -- it doesn't just shift it. (unless, of course, you are looking on a log scale :-) To prove this to yourself, imagine a recording of two notes: concert A (440 Hz) and one octave above it (880 Hz). When we increase the CD speed by 20 %, these two frequencies are changed to 528 and 1056 Hz. Assume that we've thrown away the proper number of samples from the original recording. Now, if we mix the resultant signal with a 88 Hz signal (528 - 440 = 88) and do the proper filtering, we'll get 440 Hz and 968 Hz ... oops, they don't sound like octaves any more. Theoretically, what needs to be done is to compress the spectrum of the speeded-up sound down to its original size. This can be done by applying the previously discussed interpolation/decimation method to the FFT samples of the signal (I think) and then inverse transforming and playing the signal out at the original sampling rate. I'm sure that somebody has come up with a computationally superior method to the one I have suggested. (note: invert this discussion if you want to talk about slowing a recording down.) >> Also, if you're going to remove samples, I think you >> shouldn't use a simple kill-every-nth-sample procedure... > >If you want to throw away samples, you *really* need to filter the data before >doing so, otherwise you will see (hear) aliasing of the data, depending upon >the spectra of the input and how often you are throwing away samples. When >you decimate any sampled data set, you must low-pass filter the data at half >the new sample rate (Nyquist rule) unless you are sure that the data has >no spectral components above half the new sample rate. > >followed by a carrier and lower sideband suppression (Hilbert transform filters >are very easy to implement digitally). At a fast glance, I think this might >work well for moving the spectra of an audio source up/down some arbitrary >frequency, and should be doable with some of the common DSP chips currently >available. > >Mike Horne >Visual Systems Group >Tektronix, Inc. >mhorne@ka7axd.wv.tek.com -- swass@apple.com
brianw@microsoft.UUCP (Brian Willoughby) (09/23/89)
In article <61860@tut.cis.ohio-state.edu> dnwiebe@cis.ohio-state.edu (Dan N Wiebe) writes: > >there's a better way to lower pitch than just doubling samples. >Seems to me that you should interpolate them; [...] >[...] That's a reasonably simple >add-and-shift-right-one-bit algorithm that shouldn't take too long >and would preserve more of the fidelity than sample doubling. That would work fine for a fixed shift downward of exactly 1 octave. A good analog filter set at the proper frequency (1/4 sampling rate) could also do the "interpolation". > Also, if you're going to remove samples, I think you >shouldn't use a simple kill-every-nth-sample procedure. It seems >to me that certain samples (local maxima and minima) are more >important than others. Use a five- or seven-sample queue, where >you consider the middle one for removal. If it's a local minimum >or maximum, zap one of the ones next to it instead. That's a bit >more complicated, but I think a 10MHz 8086 could probably keep up >with a 30Khz 16-bit sample stream. You would be surprised how slow the 8086 is when repeating an mediumly complex operation thirty-thousand times a second. A 10MHz 8086 has an instruction rate of only slightly greater than 500 kIPS (that's a rough estimate, folks, but I'm sure that its below 1 MIPS), and with 30ksamples per second, you would have to achieve your algorithm with only 16 instructions. The 8086 would have a hard time just examining that much data, much less altering it. What you've described, at least the end result of maintaining local maxima and minima, could be done using curve fitting techniques (which I know very little about), but would certainly require a faster processor. > Again, while I'm very interested in this field, I am by no >stretch of the imagination anything remotely resembling an >authority, so keep your flames gentle :-). No flame intended, in fact I'm open to suggestions and ideas no matter where they come from. Brian Willoughby UUCP: ...!{tikal, sun, uunet, elwood}!microsoft!brianw InterNet: microsoft!brianw@uunet.UU.NET or: microsoft!brianw@Sun.COM Bitnet brianw@microsoft.UUCP
d88-jwa@nada.kth.se (Jon W{tte) (09/24/89)
In article <4653@orca.WV.TEK.COM> mhorne%ka7axd.wv.tek.com@relay.cs.net writes: >frequency scaling algorithm, perhaps by doing a digital mix with a reference >(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency), >followed by a carrier and lower sideband suppression (Hilbert transform filter >are very easy to implement digitally). At a fast glance, I think this might >work well for moving the spectra of an audio source up/down some arbitrary >frequency, and should be doable with some of the common DSP chips currently >available. As I've said before: that's not scaling, that's OFFSET ! You can't do that to MUSIC, because music has a realative overtone spectra. Consider: 440 Hz + 880 Hz make a (very simple) harmonic note. Shift 100 Hz: 540 + 940 Hz makes two sine notes !!! And imagine the effect this has on complex waveforms like a violin or a piano ... SHUDDER ! h+@nada.kth.se -- The only way to get rid of temptation is to yield to it.
mhorne@ka7axd.wv.tek.com (Michael T. Horne) (09/25/89)
In a recent article by d88-jwa@nada.kth.se (Jon W{tte): >>frequency scaling algorithm, perhaps by doing a digital mix with a reference >>(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency), >>followed by a carrier and lower sideband suppression... > >As I've said before: that's not scaling, that's OFFSET ! You can't do that >to MUSIC, because music has a realative overtone spectra. Ah. Having arrived at this discussion in mid stream (without the benefit of context), I believe that the question is about scaling an arbitrary `musical' (from the viewpoint of the average person) sequence up or down in frequency while retaining the `musical-ness' of it. :) I still stand by my suggestion of shifting an arbitrary spectrum up/down in frequency by doing a digital mix, however, I agree that it will not generate the desired effect for use in scaling music. It would appear that the `vocoder' method may provide arbitrary control (e.g. true (multiplicative) scaling) for generating such desired effects. >And imagine the effect this has on complex waveforms like a violin or a >piano... Could you please elaborate on the typical spectra of a given note on a piano or violin? Mike Michael T. Horne VSG/ITD, Tektronix, Inc. mhorne@ka7axd.wv.tek.com (503) 685-2077
gda@creare.creare.UUCP (Gray Abbott) (09/26/89)
The most common term in the literature for changing the speed of a signal without changing the pitch is "time scaling". Some researcher's names I vaguely remember are Jones (at Rice U.) and Quatieri (at MIT Lincoln Labs). Follow them to other references. Check IEEE ASSP journals. Jones had a TMS32010 algorithm which could process speech in real-time. Quatieri uses (I think) a cosine transform method, which I "hear" is very good, even on music, but I haven't "heard" it. The FFT approach would be to use a Short-time Fourier transform (see Rabiner and Schafer's _Digital Processing of Speech Signals_), with modifications to the reconstruction algorithm to allow for compressed/expanded time scales. Any modifications to the STFT usually degrade the signal. It's probably a little beyond current DSP chips (probably not beyond custom military systems), but not too far out of reach. The inverse problem of pitch shifting was correctly addressed by another poster: you interpolate and decimate to get a rational shift. Note that musicians can be very sensitive to small errors in pitch. This is why some digital synthesizers use variable rate sampling instead of digital pitch shifting. To achieve high quality shifts can require several megabytes of buffer space for the interpolate/decimate operations. Fortunately, it doesn't require a whole lot of computation, if done right. It's been a while since I've thought about any of this, and I don't have my notes at hand, so don't take it all as gospel. Gray Abbott ...dartvax!creare!gda
elliott@optilink.UUCP (Paul Elliott x225) (09/26/89)
In article <4653@orca.WV.TEK.COM>, mhorne@ka7axd.WV.TEK.COM (Michael T. Horne) writes: > <deleted> > All this said, I don't think this is the optimal method for tone shifting, > however it might work for `fast/slow-forward' effects. If you wish to shift > the tones while retaining the same sample rate, I would suggest some sort of > frequency scaling algorithm, perhaps by doing a digital mix with a reference > (digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency), > followed by a carrier and lower sideband suppression (Hilbert transform filters > are very easy to implement digitally). At a fast glance, I think this might > work well for moving the spectra of an audio source up/down some arbitrary > frequency, and should be doable with some of the common DSP chips currently > available. Pitch shifting like this doesn't work (well), since it alters all the harmonic relationships. For example: a 1000 Hz / 2000 Hz octave, after shifting 100 Hz becomes a 1100 Hz / 2100 Hz pair -- definitely not an octave. If you have the opportunity, listen to single-sideband voice (ham radio is a good place). You can only shift the pitch a small amount (by mis-tuning the receiver) before virtually destroying the intellegibility of the voice, where with true pitch shifting (all frequencies multiplied by n), voice remains reasonably understandable over several octaves. Of course if you think shifting-by-adding messes up voice, you should hear what it does to music! Definitely not Hi-Fi -- more like atonal bagpipes :-) ... Paul -- Paul M. Elliott Optilink Corporation (707) 795-9444 {pyramid,pixar,tekbspa}!optilink!elliott "I used to think I was indecisive, but now I'm not so sure."
ingoldsb@ctycal.COM (Terry Ingoldsby) (09/27/89)
In article <4671@orca.WV.TEK.COM>, mhorne@ka7axd.wv.tek.com (Michael T. Horne) writes: > In a recent article by d88-jwa@nada.kth.se (Jon W{tte): > >>frequency scaling algorithm, perhaps by doing a digital mix with a reference > >>(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency), > >>followed by a carrier and lower sideband suppression... > > > >As I've said before: that's not scaling, that's OFFSET ! You can't do that > >to MUSIC, because music has a realative overtone spectra. > > context), I believe that the question is about scaling an arbitrary `musical' > (from the viewpoint of the average person) sequence up or down in frequency > while retaining the `musical-ness' of it. :) I still stand by my suggestion ... > >And imagine the effect this has on complex waveforms like a violin or a ... > Could you please elaborate on the typical spectra of a given note on a > piano or violin? Actually, it doesn't matter what instrument (including the human voice) you pick. They all contain harmonics and overtones which (to sound right) must be multiples of the primary frequency. In addition, instruments like the piano have chords. These notes have (generally) frequency relationships that must be preserved. An offset will *not* do this. (It might be interesting to hear the result). Basically the shift must vary with frequency to be accurate. This still might be doable in frequency space. A simple shift of the frequency function would produce an offset, therefore shifting the higher frequencies more than the lower might do the trick. -- Terry Ingoldsby ctycal!ingoldsb@calgary.UUCP Land Information Systems or The City of Calgary ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb
mhorne@ka7axd.WV.TEK.COM (Michael T. Horne) (09/28/89)
> In general, it is easy to convert between two sampling rates that are > rational multiples of each other (hence, I chose 6/5 in my example). > The first step is to interpolate the signal by the numerator ... > > The next step is to filter out all (or most) spectral energy which > would be "aliased" when throwing away the unneeded samples... > > Obviously, if you want to change the sampling rate by 7724/137, you have > a problem. The main merits of the system I discussed earlier is that you *can* interpolate by any rational increment, subject to the accuracy of the word size you use to represent the increment. Resampling the input by a 7724/137 ~= 56.38X sampling rate increase is easy. You aren't limited to small, rational increments in sampling rate since you are are directly calculating the actual interpolated sample values from the original samples. By evaluating the sin(x)/x function at the correct locations (which need not be rational integer locations) you can interpolate *any* point between elements in the data set. The only hard part in using this type of interpolator is how you calculate the sinc coefficients. If you have a non-repetitive interpolation increment (i.e. if you need to recalculate the sinc coefficients for each new interpolated data point), the bottleneck in such a system resides in the sinc calculation. However, there are methods for minimizing this hit, mostly by trading off accurracy for computation time since your original data set accuracy is dependent upon how many bits of resolution you have, how much noise is on the data, the length of the FIR filter you use to calculate the interpolated value, etc. This is very similar to how an analog reconstruction filter `interpolates' the functional values between the known data points output by a DAC. A perfect (low pass) reconstruction filter has a sin(x)/x impluse response, and its response is convolved with the known data points to generate all `points' between them. Less than perfect (read: real) reconstruction filters usually have impulse responses that have the general shape of a sinc function (though not exactly), but are usually sufficient for correctly reconstructing the output waveforms. I have implemented several interpolators using this scheme, and they work very well and can be easily implemented on a DSP chip set. By the way, for those of you interested in multirate DSP, I recommend obtaining a copy of "Multirate Digital Signal Processing," by Crochiere and Rabiner (Prentice Hall). This text provides an excellent background on the topic that Steve discussed in his earlier article. Also, a chapter in "Advanced Topics in Digital Signal Processing," by Lim and Oppenheim (also Prentice Hall) provides a good introduction to the same topic. > -- > swass@apple.com Mike mhorne@ka7axd.wv.tek.com