[comp.dsp] Adjust-Speed CD player??

P85025@BARILVM.BITNET (Doron Shikmoni) (09/21/89)

In response to a question on rec.audio, about the availability of CD
players in which the music "speed" can be changed while maintaining
the pitch, I posted an article which seems to have stirred a flood
of responses. I was out of the office since, so I hadn't had the time
to reply to some of the messages.

To start with, this article is posted to both rec.audio and the newly
born comp.dsp (welcome!), with followups solicited into comp.dsp only.

In my posting, I said that the issue of digitally varying the speed
of music, while maintaining pitch (or vice versa - same problem), is
theoretically hard, is somewhat simpler for speech (and is being done
for speech), and probably cannot be solved "perfectly" (i.e., for hi-fi
music. Many responses followed.

A major part of the responses missed the point of the question, and
replied "it's easy", "it's done in turntables and tape decks", "there's
a Technics CD that does it" and so forth. Please, read the original
question. It is easy - very easy, to change the speed of the reproduction
the same way it is done in variable pitch tape decks or turntables. All
that's required is a change in signal frequency.

Others suggested to drop samples (to change output speed). To change
by 1%, drop 1 out of each 100. Of course, this is doable; but what
will happen to the music? Try to draw the new curve when you drop
30% of the samples (or double 30% of them to achieve the opposite
effect). Is this hi-fi? Not to my opinion...

Others suggested spectrum analysis and FFT to move from time domain
to frequency domain and vice versa. (1) Can this really be done in
real time with today's DSP technology? I would doubt that, although
I'm not very familiar with state of the art DSP chips, I must admit.
and (2) as I understand it (I might be wrong here - Fourier stuff
is not one of my stronger parts), this process should be made on
a "quantum" at a time - it's not a continuous process. You will
still have distortion when you connect the reconstructed parts
in the time domain; either you will introduce new harmonics or
you will lose information. This is in the *theoretical* view;
I don't know about tolerance - that is, if you can make this process
"good enough" for hi-fi music processing.

The examples given by some people (dictaphones, speech synthesisers,
speech distortion units) do not preserve the sound quality. So again,
this does not answer the original question.

Regards
Doron

sandell@batcomputer.tn.cornell.edu (Gregory Sandell) (09/22/89)

In article <89264.171306P85025@BARILVM.BITNET> P85025@BARILVM.BITNET (Doron Shikmoni) writes:
>
>Others suggested to drop samples (to change output speed). To change
>by 1%, drop 1 out of each 100. Of course, this is doable; but what
>will happen to the music? Try to draw the new curve when you drop
>30% of the samples (or double 30% of them to achieve the opposite
>effect). Is this hi-fi? Not to my opinion...
>
	I think that the choice of putting speed-variation option on
a CD-player should not be constrained by the requirement that it
be hi-fi.  What are people going to be using the feature for,
anyway?  My particular use, since I'm a musician, is that I'd
want to use the feature to SLOW DOWN  the music in order to 
transcribe or learn by ear what a musician is playing.  Other
people may want to play spoken CDs at higher speed so they can
assimilate information quicker.  In both of these cases, I don't
think the user really is going to care that the audio quality is
distinguishable from normal playback.

Greg Sandell

ggs@ulysses.homer.nj.att.com (Griff Smith) (09/22/89)

In article <89264.171306P85025@BARILVM.BITNET>, P85025@BARILVM.BITNET (Doron Shikmoni) writes:
| Others suggested spectrum analysis and FFT to move from time domain
| to frequency domain and vice versa....
| ... this process should be made on
| a "quantum" at a time - it's not a continuous process. You will
| still have distortion when you connect the reconstructed parts
| in the time domain...
| I don't know about tolerance - that is, if you can make this process
| "good enough" for hi-fi music processing.

It works.  I don't know the details of how it was done, but I have
heard music reconstructed this way (without time change or pitch
shift).  It was indistinguishable from the original.
-- 
Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		1-201-582-7736
UUCP:		{most AT&T sites}!ulysses!ggs
Internet:	ggs@ulysses.att.com

samd@Apple.COM (Sam Dicker) (09/22/89)

In article <12190@ulysses.homer.nj.att.com> ggs@ulysses.homer.nj.att.com (Griff Smith) writes:
>In article <89264.171306P85025@BARILVM.BITNET>, P85025@BARILVM.BITNET (Doron Shikmoni) writes:
>| Others suggested spectrum analysis and FFT to move from time domain
>| to frequency domain and vice versa....
>| ... this process should be made on
>| a "quantum" at a time - it's not a continuous process. You will
>| still have distortion when you connect the reconstructed parts
>| in the time domain...
>| I don't know about tolerance - that is, if you can make this process
>| "good enough" for hi-fi music processing.
>
>It works.  I don't know the details of how it was done, but I have
>heard music reconstructed this way (without time change or pitch
>shift).  It was indistinguishable from the original.

I've heard this done with samples of certain musical instruments
with a *phase vocoder* which incorporates an FFT.
Is an FFT alone adequate for all hi-fi music processing?

Sam Dicker        samd@apple.com        (408) 974-6490 (voicemail)
---

d88-jwa@nada.kth.se (Jon W{tte) (09/22/89)

In article <89264.171306P85025@BARILVM.BITNET> P85025@BARILVM.BITNET (Doron Shikmoni) writes:
>In response to a question on rec.audio, about the availability of CD
>players in which the music "speed" can be changed while maintaining
>the pitch, I posted an article which seems to have stirred a flood

>To start with, this article is posted to both rec.audio and the newly
>born comp.dsp (welcome!), with followups solicited into comp.dsp only.

So this is going to comp.dsp. Oh, well, I'm changing the Newsgroups,
but FOLLOWUP TO SOMP.DSP from this message.

>A major part of the responses missed the point of the question, and
>replied "it's easy", "it's done in turntables and tape decks", "there's
>a Technics CD that does it" and so forth. Please, read the original

Yes, I've suggested the Technics SL-P1200 in combination with a digital
effect with pitch bend, to get the music back to the original pitch.

>effect). Is this hi-fi? Not to my opinion...

No, and I clearly stated so in my posts. Many people seem to be unaware
of how basic music theory, sound theory and digital sound theory work
and interact. Thank god for comp.dsp where all will be revealed to the
wondering mob ;')

h+@nada.kth.se

-- 
Death is Nature's way of saying 'slow down'.

brianw@microsoft.UUCP (Brian Willoughby) (09/23/89)

In article <89264.171306P85025@BARILVM.BITNET> P85025@BARILVM.BITNET (Doron Shikmoni) writes:
[...]
>
>Others suggested to drop samples (to change output speed). To change
>by 1%, drop 1 out of each 100. Of course, this is doable; but what
>will happen to the music? Try to draw the new curve when you drop
>30% of the samples (or double 30% of them to achieve the opposite
>effect). Is this hi-fi? Not to my opinion...

This also changes time with pitch, and is the poor-man's resampling
method.  My first attempts at a variable speed sample player on my Apple
II used this method.  If the original sample data were taken at a much
higher rate than the playback rate, then this method isn't *too* bad.
Usually distortion is heard - more for non-integral changes in sampling
rate.  For example, dropping *exactly* half the samples to raise an
octave cause little distortion, but a semi-tone up or down is horrible.

>Others suggested spectrum analysis and FFT to move from time domain
>to frequency domain and vice versa. (1) Can this really be done in
>real time with today's DSP technology? I would doubt that, although
>I'm not very familiar with state of the art DSP chips, I must admit.
>and (2) as I understand it (I might be wrong here - Fourier stuff
>is not one of my stronger parts), this process should be made on
>a "quantum" at a time - it's not a continuous process. You will
>still have distortion when you connect the reconstructed parts
>in the time domain; either you will introduce new harmonics or
>you will lose information. This is in the *theoretical* view;
>I don't know about tolerance - that is, if you can make this process
>"good enough" for hi-fi music processing.
>
>Doron

You're right.  The problem with FFTs is that they need a number of points
to work on.  No matter how fast your 1000 point FFT is, you still have to
wait until another 1000 points are available.  Based on this assumption,
you don't have a continuously changing spectrum, but one which is only
updated after N new sample points are input.

I read about a technique for a sliding window FFT.  It was still an
N-point FFT (say 1000), but as each new sample was input the FFT is
recalculated.  This method is also much faster for continuous data input,
because only the end points figure into the calculation.  With a 1000
point FFT example, the new transform is computed as a function only of
the newest point just added, and the oldest point which "falls out" of
the 1000 point buffer.  The author mentioned that a problem was
initializing the running data, but for music I didn't see a problem.
He stated that there were two methods for starting the conversion:
A - Execute a normal 1000 point FFT after filling the array with 1000
samples, and then compute new FFTs by the sliding window technique as
each new sample arrives.
B - Start with an array of zeroes, and assume that the FFT is not a
true reflection of the input data until 1000 sliding window-style FFTs
have been computed.

The latter approach basically generates FFT output as if the input were
an impulse starting after 1000 zero-valued samples.

I think that for musical applications, the delay of N*(sample rate) would
be unnoticable, and the FFT output would appear to be valid instantly.

I believe that this article was in the Electronic Design News.

Brian Willoughby
UUCP:           ...!{tikal, sun, uunet, elwood}!microsoft!brianw
InterNet:       microsoft!brianw@uunet.UU.NET
  or:           microsoft!brianw@Sun.COM
Bitnet          brianw@microsoft.UUCP

brianw@microsoft.UUCP (Brian Willoughby) (09/23/89)

In article <8909@batcomputer.tn.cornell.edu> sandell@tcgould.tn.cornell.edu (Gregory Sandell) writes:
>	I think that the choice of putting speed-variation option on
>a CD-player should not be constrained by the requirement that it
>be hi-fi.  What are people going to be using the feature for,
>anyway?  My particular use, since I'm a musician, is that I'd
>want to use the feature to SLOW DOWN  the music in order to 
>transcribe or learn by ear what a musician is playing.  Other
>people may want to play spoken CDs at higher speed so they can
>assimilate information quicker.  In both of these cases, I don't
>think the user really is going to care that the audio quality is
>distinguishable from normal playback.
>
>Greg Sandell

If the CD playback speed were varied by changing the sample output rate,
instead of maintaining a constant output rate and throwing away samples,
then this distortion wouldn't occur.  I can't see any advantage to
dropping samples just to maintain the same conversion rate.  You are
still left with the more difficult problem of what to do about the rate
of data coming *from the disk itself*.  If you solve that, then simply
changing the conversion rate is trivial.

Basically, I'm saying that its too easy to avoid the distortion from
dropping samples, so why do it?

On a side note, I have heard that someone has developed a compression
scheme to fit sixteen times as much data on a CD as is currently done.
If you think about the typical audio waveform, you'll understand that
it is easy to compress.  Just by storing the *difference* between
adjacent samples, and assuming that there are no impulses, a great
savings in data can be achieved over storing 16 bit *absolute* values.
The problem with storing sixteen times as much sound on a CD is that
the CD must still be accessed at the data rates it was designed for.
In other words, they are getting 16 times too much data at any given
time.  Solution: read the CD from front to back sixteen times, each time
converting a different block of data.  Data frames on the CD format are
broken into 16 blocks, and the player just cycles through these.

It's too bad that that music company Southworth recently went bankrupt.
They had announced a Macintosh II-based set of cards which employed
similar compression schemes.  They cited 30 minutes of stereo audio on
a 40 M hard disk with 20 bit samples at a rate of 192 kHz per channel.

Brian Willoughby
UUCP:           ...!{tikal, sun, uunet, elwood}!microsoft!brianw
InterNet:       microsoft!brianw@uunet.UU.NET
  or:           microsoft!brianw@Sun.COM
Bitnet          brianw@microsoft.UUCP

d88-jwa@nada.kth.se (Jon W{tte) (09/24/89)

In article <7814@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes:
>In article <8909@batcomputer.tn.cornell.edu> sandell@tcgould.tn.cornell.edu (Gregory Sandell) writes:
>On a side note, I have heard that someone has developed a compression
>scheme to fit sixteen times as much data on a CD as is currently done.

>The problem with storing sixteen times as much sound on a CD is that
>the CD must still be accessed at the data rates it was designed for.
>In other words, they are getting 16 times too much data at any given
>time.  Solution: read the CD from front to back sixteen times, each time
>converting a different block of data.  Data frames on the CD format are
>broken into 16 blocks, and the player just cycles through these.

No no. You still have exatly as many bits on the CD as before, only,
the redundancy of the info kept there is minimized. If you read the
CD at the original speed, and decompres it quickly enough, you'll have
16 times higher sampling speed, not 16 times longer play...

The problem is; there's no standard. You can't get your CD player to
recognize the compressed input, but if it's doable in real time,
sampling synthesizers would benefit from this.

h+@nada.kth.se
-- 
Today is the tomorrow you worried about yesterday

rich@eddie.MIT.EDU (Richard Caloggero) (09/26/89)

In article <7814@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes:
... ... ...
>It's too bad that that music company Southworth recently went bankrupt.
>They had announced a Macintosh II-based set of cards which employed
>similar compression schemes.  They cited 30 minutes of stereo audio on
>a 40 M hard disk with 20 bit samples at a rate of 192 kHz per channel.
>
>Brian Willoughby
>UUCP:           ...!{tikal, sun, uunet, elwood}!microsoft!brianw
>InterNet:       microsoft!brianw@uunet.UU.NET
>  or:           microsoft!brianw@Sun.COM
>Bitnet          brianw@microsoft.UUCP


    Wow, maybe certain people got nervous about the
potential ramifications a system such as this has with respect
to the *recording industry*. [Where are those DAT/Writable CD systems anyway -- guess all us
musicians should move to Japan!]
:-) (I don't want to start a big flame about this, but it's been
a sour spot with me for quite some time).
-- 
						-- Rich (rich@eddie.mit.edu).
	The circle is open, but unbroken.
	Merry meet, merry part,
	and merry meet again.

toma@hpsad.HP.COM (Tom Anderson) (09/26/89)

>>frequency scaling algorithm, perhaps by doing a digital mix with a reference
>>(digital) carrier (i.e. ref = 100 Hz for a 100 Hz shift upward in frequency),
>>followed by a carrier and lower sideband suppression (Hilbert transform filter
>>are very easy to implement digitally).  At a fast glance, I think this might
>>work well for moving the spectra of an audio source up/down some arbitrary
>>frequency, and should be doable with some of the common DSP chips currently
>>available.
>
>As I've said before: that's not scaling, that's OFFSET ! You can't do that
>to MUSIC, because music has a realative overtone spectra. Consider:
>
>440 Hz + 880 Hz make a (very simple) harmonic note.
>
>Shift 100 Hz:
>
>540 + 940 Hz makes two sine notes !!! And imagine the effect this has on
>complex waveforms like a violin or a piano ... SHUDDER !

The above technique has the advantage that it doesn't rely on an FFT, so
that windowing issues are avoided.  The hardware is also easier than an
FFT.  It seems that an FFT is really called for, so that the frequency
shift can be made on a logarithmic frequency axis.

An interesting question is:  how many FFT points are required?  I think
that you need to know the maximum frequency to be represented, the
minimum frequency to be shifted, and the smallest amount of shift.  To
shift from C0 at 16.35Hz to C#0 at 17.32Hz requires a shift of about
1Hz, so the FFT bins should be spaced by about 1Hz.  An FFT with this
spacing covering 0Hz-20kHz would need about 20,000 points.  To keep the
fidelity high, one transform every few milliseconds or so would be
required.  Such a brute force technique gets expensive quickly.

It seems like you need a logarithmic frequency axis.  I have often
wished for an FFT type algorithm with logarithmic frequency spacings.
Does anyone know of one?

esker@abaa.uucp (Lawrence Esker) (09/27/89)

In response to another article first, the terms FFT and real-time are
oximoronic.  To do an FFT in real time would mean computing the full FFT
every sample period then scaling and inverse FFT in the same sample period.
Maybe a super parallel processor could do it, if you have the money.

The design of the FFT algoritm assumes you have access to all samples
simulataneously to do the calculation.  It is not geared toward one sample
at a time calculation.  To do this one must revert to the original DFT
algorithm with a FIFO.  This adds the effect on the current sample and
removes the effects of (current - n) sample.

In article <7813@microsoft.UUCP> brianw@microsoft.UUCP (Brian Willoughby) writes:
>In article <89264.171306P85025@BARILVM.BITNET> P85025@BARILVM.BITNET (Doron Shikmoni) writes:
>>Others suggested spectrum analysis and FFT to move from time domain
>>to frequency domain and vice versa. [...]
>>        as I understand it ... this process should be made on
>>a "quantum" at a time - it's not a continuous process. [...]

>>Doron

>You're right.  The problem with FFTs is that they need a number of points
>to work on.  No matter how fast your 1000 point FFT is, you still have to
>wait until another 1000 points are available. [...]

>I read about a technique for a sliding window FFT.  It was still an
>N-point FFT (say 1000), but as each new sample was input the FFT is
>recalculated.  This method is also much faster for continuous data input,
>because only the end points figure into the calculation.  With a 1000
>point FFT example, the new transform is computed as a function only of
>the newest point just added, and the oldest point which "falls out" of
>the 1000 point buffer. [...]

>I believe that this article was in the Electronic Design News.

>Brian Willoughby

Yes, the sliding-FFT looked like a great design invention until you studied
it more closely and realized it was simply the original Discrete Fourier
Transform (DFT) restated in a different way.  Since the FFT is an algorithmic
shortcut to the DFT, it made me chuckle to see the DFT used to perform the
FFT, albeit under a new name of sliding-FFT.
--
---------- Lawrence W. Esker ----------  Modern Amish: Thou shalt not need any
                                         computer that is not IBM compatible.
UseNet Path: __!mailrus!sharkey!itivax!abaa!esker  ==  esker@abaa.UUCP

jensen@bessel.eedsp.gatech.edu (P. Allen Jensen) (09/30/89)

An FFT can be done in real-time if you assume that real-time can include
a delay between
input and output for startup.  Then you get the first set of samples, do
the FFT and get the
next set of samples in paralle.  The FFT must be done in less than or
equal to the time to
get the first set of samples.  You then have a pipeline doing FFT and
getting the next set
of samples with a delay of one window (frame) time.

Anyone see any problems with that ?

P. Allen Jensen
Georgia Tech, School of Electrical Engineering, Atlanta, GA  30332
USENET: ...!{allegra,hplabs,ihnp4,ulysses}!gatech!eedsp!jensen
INTERNET: jensen@eedsp.gatech.edu

brianw@microsoft.UUCP (Brian Willoughby) (10/02/89)

In article <474@eedsp.gatech.edu> jensen@bessel.eedsp.gatech.edu (P. Allen Jensen) writes:
>An FFT can be done in real-time if you assume that real-time can include
>a delay between input and output for startup.  Then you get the first set
>of samples, do the FFT and get the next set of samples in paralle.  The
>FFT must be done in less than or equal to the time to get the first set
>of samples.  You then have a pipeline doing FFT and getting the next set
>of samples with a delay of one window (frame) time.
>
>Anyone see any problems with that ?

I don't think that the delay would be the big problem, but IF the goal is
to subsequently compute an inverse transform for the purpose of creating
audio, then you must realize that you have lost information about the
original signal.  The problem is that the result of such a series of FFT
calculations is not a continuous picture of the frequency spectrum, but
an average over a relatively long (compared to the sample rate) period of
time.  It is true that the FFT probably has enough info to give you the
original signal back, but if you do any processing on the data in the
frequency domain, then you are very likely to come up with incorrect
(read: distorted) results because of the averaging effects of the FFT.

I'm still curious about whether the sliding-FFT, or some version of the
DFT, is capable of generating detailed enough frequency domain data at
the same rate as the sampled data for the purpose of accurately
reconstructing a slightly modified (but equal quality) version of the
incoming audio samples.

If you just want to display the data, you are fine.  In fact, the
HP spectrum analyser we had at NCSU had a noticable delay, but the
information on the display still updated faster that I could react to
what I saw.

>P. Allen Jensen
>Georgia Tech, School of Electrical Engineering, Atlanta, GA  30332
>USENET: ...!{allegra,hplabs,ihnp4,ulysses}!gatech!eedsp!jensen
>INTERNET: jensen@eedsp.gatech.edu

Brian Willoughby
UUCP:           ...!{tikal, sun, uunet, elwood}!microsoft!brianw
InterNet:       microsoft!brianw@uunet.UU.NET
  or:           microsoft!brianw@Sun.COM
Bitnet          brianw@microsoft.UUCP