P85025@BARILVM.BITNET (Doron Shikmoni) (09/21/89)
In response to a question on rec.audio, about the availability of CD players in which the music "speed" can be changed while maintaining the pitch, I posted an article which seems to have stirred a flood of responses. I was out of the office since, so I hadn't had the time to reply to some of the messages. To start with, this article is posted to both rec.audio and the newly born comp.dsp (welcome!), with followups solicited into comp.dsp only. In my posting, I said that the issue of digitally varying the speed of music, while maintaining pitch (or vice versa - same problem), is theoretically hard, is somewhat simpler for speech (and is being done for speech), and probably cannot be solved "perfectly" (i.e., for hi-fi music. Many responses followed. A major part of the responses missed the point of the question, and replied "it's easy", "it's done in turntables and tape decks", "there's a Technics CD that does it" and so forth. Please, read the original question. It is easy - very easy, to change the speed of the reproduction the same way it is done in variable pitch tape decks or turntables. All that's required is a change in signal frequency. Others suggested to drop samples (to change output speed). To change by 1%, drop 1 out of each 100. Of course, this is doable; but what will happen to the music? Try to draw the new curve when you drop 30% of the samples (or double 30% of them to achieve the opposite effect). Is this hi-fi? Not to my opinion... Others suggested spectrum analysis and FFT to move from time domain to frequency domain and vice versa. (1) Can this really be done in real time with today's DSP technology? I would doubt that, although I'm not very familiar with state of the art DSP chips, I must admit. and (2) as I understand it (I might be wrong here - Fourier stuff is not one of my stronger parts), this process should be made on a "quantum" at a time - it's not a continuous process. You will still have distortion when you connect the reconstructed parts in the time domain; either you will introduce new harmonics or you will lose information. This is in the *theoretical* view; I don't know about tolerance - that is, if you can make this process "good enough" for hi-fi music processing. The examples given by some people (dictaphones, speech synthesisers, speech distortion units) do not preserve the sound quality. So again, this does not answer the original question. Regards Doron
sandell@batcomputer.tn.cornell.edu (Gregory Sandell) (09/22/89)
In article <89264.171306P85025@BARILVM.BITNET> P85025@BARILVM.BITNET (Doron Shikmoni) writes: > >Others suggested to drop samples (to change output speed). To change >by 1%, drop 1 out of each 100. Of course, this is doable; but what >will happen to the music? Try to draw the new curve when you drop >30% of the samples (or double 30% of them to achieve the opposite >effect). Is this hi-fi? Not to my opinion... > I think that the choice of putting speed-variation option on a CD-player should not be constrained by the requirement that it be hi-fi. What are people going to be using the feature for, anyway? My particular use, since I'm a musician, is that I'd want to use the feature to SLOW DOWN the music in order to transcribe or learn by ear what a musician is playing. Other people may want to play spoken CDs at higher speed so they can assimilate information quicker. In both of these cases, I don't think the user really is going to care that the audio quality is distinguishable from normal playback. Greg Sandell
ggs@ulysses.homer.nj.att.com (Griff Smith) (09/22/89)
In article <89264.171306P85025@BARILVM.BITNET>, P85025@BARILVM.BITNET (Doron Shikmoni) writes: | Others suggested spectrum analysis and FFT to move from time domain | to frequency domain and vice versa.... | ... this process should be made on | a "quantum" at a time - it's not a continuous process. You will | still have distortion when you connect the reconstructed parts | in the time domain... | I don't know about tolerance - that is, if you can make this process | "good enough" for hi-fi music processing. It works. I don't know the details of how it was done, but I have heard music reconstructed this way (without time change or pitch shift). It was indistinguishable from the original. -- Griff Smith AT&T (Bell Laboratories), Murray Hill Phone: 1-201-582-7736 UUCP: {most AT&T sites}!ulysses!ggs Internet: ggs@ulysses.att.com
samd@Apple.COM (Sam Dicker) (09/22/89)
In article <12190@ulysses.homer.nj.att.com> ggs@ulysses.homer.nj.att.com (Griff Smith) writes: >In article <89264.171306P85025@BARILVM.BITNET>, P85025@BARILVM.BITNET (Doron Shikmoni) writes: >| Others suggested spectrum analysis and FFT to move from time domain >| to frequency domain and vice versa.... >| ... this process should be made on >| a "quantum" at a time - it's not a continuous process. You will >| still have distortion when you connect the reconstructed parts >| in the time domain... >| I don't know about tolerance - that is, if you can make this process >| "good enough" for hi-fi music processing. > >It works. I don't know the details of how it was done, but I have >heard music reconstructed this way (without time change or pitch >shift). It was indistinguishable from the original. I've heard this done with samples of certain musical instruments with a *phase vocoder* which incorporates an FFT. Is an FFT alone adequate for all hi-fi music processing? Sam Dicker samd@apple.com (408) 974-6490 (voicemail) ---
d88-jwa@nada.kth.se (Jon W{tte) (09/22/89)
In article <89264.171306P85025@BARILVM.BITNET> P85025@BARILVM.BITNET (Doron Shikmoni) writes: >In response to a question on rec.audio, about the availability of CD >players in which the music "speed" can be changed while maintaining >the pitch, I posted an article which seems to have stirred a flood >To start with, this article is posted to both rec.audio and the newly >born comp.dsp (welcome!), with followups solicited into comp.dsp only. So this is going to comp.dsp. Oh, well, I'm changing the Newsgroups, but FOLLOWUP TO SOMP.DSP from this message. >A major part of the responses missed the point of the question, and >replied "it's easy", "it's done in turntables and tape decks", "there's >a Technics CD that does it" and so forth. Please, read the original Yes, I've suggested the Technics SL-P1200 in combination with a digital effect with pitch bend, to get the music back to the original pitch. >effect). Is this hi-fi? Not to my opinion... No, and I clearly stated so in my posts. Many people seem to be unaware of how basic music theory, sound theory and digital sound theory work and interact. Thank god for comp.dsp where all will be revealed to the wondering mob ;') h+@nada.kth.se -- Death is Nature's way of saying 'slow down'.
brianw@microsoft.UUCP (Brian Willoughby) (09/23/89)
In article <89264.171306P85025@BARILVM.BITNET> P85025@BARILVM.BITNET (Doron Shikmoni) writes: [...] > >Others suggested to drop samples (to change output speed). To change >by 1%, drop 1 out of each 100. Of course, this is doable; but what >will happen to the music? Try to draw the new curve when you drop >30% of the samples (or double 30% of them to achieve the opposite >effect). Is this hi-fi? Not to my opinion... This also changes time with pitch, and is the poor-man's resampling method. My first attempts at a variable speed sample player on my Apple II used this method. If the original sample data were taken at a much higher rate than the playback rate, then this method isn't *too* bad. Usually distortion is heard - more for non-integral changes in sampling rate. For example, dropping *exactly* half the samples to raise an octave cause little distortion, but a semi-tone up or down is horrible. >Others suggested spectrum analysis and FFT to move from time domain >to frequency domain and vice versa. (1) Can this really be done in >real time with today's DSP technology? I would doubt that, although >I'm not very familiar with state of the art DSP chips, I must admit. >and (2) as I understand it (I might be wrong here - Fourier stuff >is not one of my stronger parts), this process should be made on >a "quantum" at a time - it's not a continuous process. You will >still have distortion when you connect the reconstructed parts >in the time domain; either you will introduce new harmonics or >you will lose information. This is in the *theoretical* view; >I don't know about tolerance - that is, if you can make this process >"good enough" for hi-fi music processing. > >Doron You're right. The problem with FFTs is that they need a number of points to work on. No matter how fast your 1000 point FFT is, you still have to wait until another 1000 points are available. Based on this assumption, you don't have a continuously changing spectrum, but one which is only updated after N new sample points are input. I read about a technique for a sliding window FFT. It was still an N-point FFT (say 1000), but as each new sample was input the FFT is recalculated. This method is also much faster for continuous data input, because only the end points figure into the calculation. With a 1000 point FFT example, the new transform is computed as a function only of the newest point just added, and the oldest point which "falls out" of the 1000 point buffer. The author mentioned that a problem was initializing the running data, but for music I didn't see a problem. He stated that there were two methods for starting the conversion: A - Execute a normal 1000 point FFT after filling the array with 1000 samples, and then compute new FFTs by the sliding window technique as each new sample arrives. B - Start with an array of zeroes, and assume that the FFT is not a true reflection of the input data until 1000 sliding window-style FFTs have been computed. The latter approach basically generates FFT output as if the input were an impulse starting after 1000 zero-valued samples. I think that for musical applications, the delay of N*(sample rate) would be unnoticable, and the FFT output would appear to be valid instantly. I believe that this article was in the Electronic Design News. Brian Willoughby UUCP: ...!{tikal, sun, uunet, elwood}!microsoft!brianw InterNet: microsoft!brianw@uunet.UU.NET or: microsoft!brianw@Sun.COM Bitnet brianw@microsoft.UUCP
brianw@microsoft.UUCP (Brian Willoughby) (09/23/89)
In article <8909@batcomputer.tn.cornell.edu> sandell@tcgould.tn.cornell.edu (Gregory Sandell) writes: > I think that the choice of putting speed-variation option on >a CD-player should not be constrained by the requirement that it >be hi-fi. What are people going to be using the feature for, >anyway? My particular use, since I'm a musician, is that I'd >want to use the feature to SLOW DOWN the music in order to >transcribe or learn by ear what a musician is playing. Other >people may want to play spoken CDs at higher speed so they can >assimilate information quicker. In both of these cases, I don't >think the user really is going to care that the audio quality is >distinguishable from normal playback. > >Greg Sandell If the CD playback speed were varied by changing the sample output rate, instead of maintaining a constant output rate and throwing away samples, then this distortion wouldn't occur. I can't see any advantage to dropping samples just to maintain the same conversion rate. You are still left with the more difficult problem of what to do about the rate of data coming *from the disk itself*. If you solve that, then simply changing the conversion rate is trivial. Basically, I'm saying that its too easy to avoid the distortion from dropping samples, so why do it? On a side note, I have heard that someone has developed a compression scheme to fit sixteen times as much data on a CD as is currently done. If you think about the typical audio waveform, you'll understand that it is easy to compress. Just by storing the *difference* between adjacent samples, and assuming that there are no impulses, a great savings in data can be achieved over storing 16 bit *absolute* values. The problem with storing sixteen times as much sound on a CD is that the CD must still be accessed at the data rates it was designed for. In other words, they are getting 16 times too much data at any given time. Solution: read the CD from front to back sixteen times, each time converting a different block of data. Data frames on the CD format are broken into 16 blocks, and the player just cycles through these. It's too bad that that music company Southworth recently went bankrupt. They had announced a Macintosh II-based set of cards which employed similar compression schemes. They cited 30 minutes of stereo audio on a 40 M hard disk with 20 bit samples at a rate of 192 kHz per channel. Brian Willoughby UUCP: ...!{tikal, sun, uunet, elwood}!microsoft!brianw InterNet: microsoft!brianw@uunet.UU.NET or: microsoft!brianw@Sun.COM Bitnet brianw@microsoft.UUCP
d88-jwa@nada.kth.se (Jon W{tte) (09/24/89)
In article <7814@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes: >In article <8909@batcomputer.tn.cornell.edu> sandell@tcgould.tn.cornell.edu (Gregory Sandell) writes: >On a side note, I have heard that someone has developed a compression >scheme to fit sixteen times as much data on a CD as is currently done. >The problem with storing sixteen times as much sound on a CD is that >the CD must still be accessed at the data rates it was designed for. >In other words, they are getting 16 times too much data at any given >time. Solution: read the CD from front to back sixteen times, each time >converting a different block of data. Data frames on the CD format are >broken into 16 blocks, and the player just cycles through these. No no. You still have exatly as many bits on the CD as before, only, the redundancy of the info kept there is minimized. If you read the CD at the original speed, and decompres it quickly enough, you'll have 16 times higher sampling speed, not 16 times longer play... The problem is; there's no standard. You can't get your CD player to recognize the compressed input, but if it's doable in real time, sampling synthesizers would benefit from this. h+@nada.kth.se -- Today is the tomorrow you worried about yesterday
rich@eddie.MIT.EDU (Richard Caloggero) (09/26/89)
In article <7814@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes: ... ... ... >It's too bad that that music company Southworth recently went bankrupt. >They had announced a Macintosh II-based set of cards which employed >similar compression schemes. They cited 30 minutes of stereo audio on >a 40 M hard disk with 20 bit samples at a rate of 192 kHz per channel. > >Brian Willoughby >UUCP: ...!{tikal, sun, uunet, elwood}!microsoft!brianw >InterNet: microsoft!brianw@uunet.UU.NET > or: microsoft!brianw@Sun.COM >Bitnet brianw@microsoft.UUCP Wow, maybe certain people got nervous about the potential ramifications a system such as this has with respect to the *recording industry*. [Where are those DAT/Writable CD systems anyway -- guess all us musicians should move to Japan!] :-) (I don't want to start a big flame about this, but it's been a sour spot with me for quite some time). -- -- Rich (rich@eddie.mit.edu). The circle is open, but unbroken. Merry meet, merry part, and merry meet again.
toma@hpsad.HP.COM (Tom Anderson) (09/26/89)
>>frequency scaling algorithm, perhaps by doing a digital mix with a reference >>(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency), >>followed by a carrier and lower sideband suppression (Hilbert transform filter >>are very easy to implement digitally). At a fast glance, I think this might >>work well for moving the spectra of an audio source up/down some arbitrary >>frequency, and should be doable with some of the common DSP chips currently >>available. > >As I've said before: that's not scaling, that's OFFSET ! You can't do that >to MUSIC, because music has a realative overtone spectra. Consider: > >440 Hz + 880 Hz make a (very simple) harmonic note. > >Shift 100 Hz: > >540 + 940 Hz makes two sine notes !!! And imagine the effect this has on >complex waveforms like a violin or a piano ... SHUDDER ! The above technique has the advantage that it doesn't rely on an FFT, so that windowing issues are avoided. The hardware is also easier than an FFT. It seems that an FFT is really called for, so that the frequency shift can be made on a logarithmic frequency axis. An interesting question is: how many FFT points are required? I think that you need to know the maximum frequency to be represented, the minimum frequency to be shifted, and the smallest amount of shift. To shift from C0 at 16.35Hz to C#0 at 17.32Hz requires a shift of about 1Hz, so the FFT bins should be spaced by about 1Hz. An FFT with this spacing covering 0Hz-20kHz would need about 20,000 points. To keep the fidelity high, one transform every few milliseconds or so would be required. Such a brute force technique gets expensive quickly. It seems like you need a logarithmic frequency axis. I have often wished for an FFT type algorithm with logarithmic frequency spacings. Does anyone know of one?
esker@abaa.uucp (Lawrence Esker) (09/27/89)
In response to another article first, the terms FFT and real-time are oximoronic. To do an FFT in real time would mean computing the full FFT every sample period then scaling and inverse FFT in the same sample period. Maybe a super parallel processor could do it, if you have the money. The design of the FFT algoritm assumes you have access to all samples simulataneously to do the calculation. It is not geared toward one sample at a time calculation. To do this one must revert to the original DFT algorithm with a FIFO. This adds the effect on the current sample and removes the effects of (current - n) sample. In article <7813@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes: >In article <89264.171306P85025@BARILVM.BITNET> P85025@BARILVM.BITNET (Doron Shikmoni) writes: >>Others suggested spectrum analysis and FFT to move from time domain >>to frequency domain and vice versa. [...] >> as I understand it ... this process should be made on >>a "quantum" at a time - it's not a continuous process. [...] >>Doron >You're right. The problem with FFTs is that they need a number of points >to work on. No matter how fast your 1000 point FFT is, you still have to >wait until another 1000 points are available. [...] >I read about a technique for a sliding window FFT. It was still an >N-point FFT (say 1000), but as each new sample was input the FFT is >recalculated. This method is also much faster for continuous data input, >because only the end points figure into the calculation. With a 1000 >point FFT example, the new transform is computed as a function only of >the newest point just added, and the oldest point which "falls out" of >the 1000 point buffer. [...] >I believe that this article was in the Electronic Design News. >Brian Willoughby Yes, the sliding-FFT looked like a great design invention until you studied it more closely and realized it was simply the original Discrete Fourier Transform (DFT) restated in a different way. Since the FFT is an algorithmic shortcut to the DFT, it made me chuckle to see the DFT used to perform the FFT, albeit under a new name of sliding-FFT. -- ---------- Lawrence W. Esker ---------- Modern Amish: Thou shalt not need any computer that is not IBM compatible. UseNet Path: __!mailrus!sharkey!itivax!abaa!esker == esker@abaa.UUCP
jensen@bessel.eedsp.gatech.edu (P. Allen Jensen) (09/30/89)
An FFT can be done in real-time if you assume that real-time can include a delay between input and output for startup. Then you get the first set of samples, do the FFT and get the next set of samples in paralle. The FFT must be done in less than or equal to the time to get the first set of samples. You then have a pipeline doing FFT and getting the next set of samples with a delay of one window (frame) time. Anyone see any problems with that ? P. Allen Jensen Georgia Tech, School of Electrical Engineering, Atlanta, GA 30332 USENET: ...!{allegra,hplabs,ihnp4,ulysses}!gatech!eedsp!jensen INTERNET: jensen@eedsp.gatech.edu
brianw@microsoft.UUCP (Brian Willoughby) (10/02/89)
In article <474@eedsp.gatech.edu> jensen@bessel.eedsp.gatech.edu (P. Allen Jensen) writes: >An FFT can be done in real-time if you assume that real-time can include >a delay between input and output for startup. Then you get the first set >of samples, do the FFT and get the next set of samples in paralle. The >FFT must be done in less than or equal to the time to get the first set >of samples. You then have a pipeline doing FFT and getting the next set >of samples with a delay of one window (frame) time. > >Anyone see any problems with that ? I don't think that the delay would be the big problem, but IF the goal is to subsequently compute an inverse transform for the purpose of creating audio, then you must realize that you have lost information about the original signal. The problem is that the result of such a series of FFT calculations is not a continuous picture of the frequency spectrum, but an average over a relatively long (compared to the sample rate) period of time. It is true that the FFT probably has enough info to give you the original signal back, but if you do any processing on the data in the frequency domain, then you are very likely to come up with incorrect (read: distorted) results because of the averaging effects of the FFT. I'm still curious about whether the sliding-FFT, or some version of the DFT, is capable of generating detailed enough frequency domain data at the same rate as the sampled data for the purpose of accurately reconstructing a slightly modified (but equal quality) version of the incoming audio samples. If you just want to display the data, you are fine. In fact, the HP spectrum analyser we had at NCSU had a noticable delay, but the information on the display still updated faster that I could react to what I saw. >P. Allen Jensen >Georgia Tech, School of Electrical Engineering, Atlanta, GA 30332 >USENET: ...!{allegra,hplabs,ihnp4,ulysses}!gatech!eedsp!jensen >INTERNET: jensen@eedsp.gatech.edu Brian Willoughby UUCP: ...!{tikal, sun, uunet, elwood}!microsoft!brianw InterNet: microsoft!brianw@uunet.UU.NET or: microsoft!brianw@Sun.COM Bitnet brianw@microsoft.UUCP