[comp.music] Manipulating sampled sounds.

dc@max.berkeley.edu (03/19/90)

One approach you could take that I have not seen mentioned in the various
replies to your query in the newsgroup is the brute force method of FFTing
the entire stream, truncating the spectrum (i.e. setting everything over
5500 Hz to zero), FFTing back, and picking every fifth sample.  This is
in some sense the "best" way to do it, and the more usual FIR approach is
a decent approximation to this.  This method probably sounds intractable
but in some cases can actually be faster than the FIR approach, particularly
if you want good accuracy.  For example, if your samples are 16 bits and
you want to preserve you S/N, you need a filter that's good to 96 dB, and
if you don't want to roll off too far below the Nyquist frequency, then you
are looking at a very long FIR filter, and consequently a long time spent
in filtering.  If you can fit your whole tune on disk, there are good 
algorithms for doing the FFT, and the IEEE DSP book that someone else pointed
you toward has a canned program for doing this that I've had a lot of luck
with.  It would probably be more sensible to do it a chunk at a time, but
this would involve thinking carefully about how to best window the FFT and
how the quality of the resulting filter compares to a FIR.

Dave Cottingham

symon@lhotse.cs.unc.edu (James Symon) (03/19/90)

In article <00933E08.7E4E7D40@max.berkeley.edu>, dc@max.berkeley.edu writes:
> One approach you could take that I have not seen mentioned in the various
> replies to your query in the newsgroup is the brute force method of FFTing
> the entire stream, truncating the spectrum (i.e. setting everything over
> 5500 Hz to zero), FFTing back, and picking every fifth sample. 

Just what I was going to suggest. Simple and clean. But why do you
need to pick every fifth sample? I'm not familiar with the standard
inverse transform algorithms, but once you're in the frequency domain
can't you just do your "FFTing back" with whatever sampling period you
want?

Jim Symon			| symon@cs.unc.edu
Computer Science Dept		| {uunet, decvax}!mcnc!unc!symon
Chapel Hill, NC  27599-3175	| (919) w:962-1893 h:968-1024

	***Don't use "r" or my header line address***

tjr@cbnewsc.ATT.COM (thomas.j.roberts) (03/20/90)

From article <00933E08.7E4E7D40@max.berkeley.edu>, by dc@max.berkeley.edu:
> One approach you could take that I have not seen mentioned in the various
> replies to your query in the newsgroup is the brute force method of FFTing
> the entire stream, truncating the spectrum (i.e. setting everything over
> 5500 Hz to zero), FFTing back, and picking every fifth sample.  This is
> in some sense the "best" way to do it, and the more usual FIR approach is
> a decent approximation to this.  This method probably sounds intractable
> but in some cases can actually be faster than the FIR approach, particularly
> if you want good accuracy.  For example, if your samples are 16 bits and
> you want to preserve you S/N, you need a filter that's good to 96 dB, and
> if you don't want to roll off too far below the Nyquist frequency, then you
> are looking at a very long FIR filter, and consequently a long time spent
> in filtering.  If you can fit your whole tune on disk, there are good 
> algorithms for doing the FFT, and the IEEE DSP book that someone else pointed
> you toward has a canned program for doing this that I've had a lot of luck
> with.  It would probably be more sensible to do it a chunk at a time, but
> this would involve thinking carefully about how to best window the FFT and
> how the quality of the resulting filter compares to a FIR.
> 
> Dave Cottingham

Obviously, you have never done this.

Performing an FFT on an entire sample has serious mathematical difficulties.
Round-off will become important, as will the quantization error. Assuming
a 10-second 55,000 Hz sample (550,000 samples), 16-bit arithmetic is
inadequate, and floating-point will be necessary. On my PC such an
analysis would take days. If you are considering a SONG (several minutes),
a single FFT is hopeless.

	[Of course, if you have a DSP chip with floating point, megabytes
	of memory, and appropriate software, this could be much more feasible.]

If you split it up, you will still need huge FFTs (and floating point).

	[At middle C (~256 Hz), a semitone is approximately 15 Hz, so you will
	need at least 1.5 Hz of resolution in your FFT to preserve pitches (10
	Cents). With 55,000 Hz sampling, this will require a 32,768-point FFT.
	If you want to preserve pitches below middle C, you will need even
	bigger FFTs (twice as many points for every octave down).]

If you split it up, you will also have the difficult problem of stitching
the multiple IFFTs back together - The (necessary) windowing function will
make this ambiguous (the window goes to zero at the points where stitching
is performed), so you will need to overlap your FFT intervals (usually
by 50%). Then you have to eliminate, or at least minimize the phase 
discontinuities (which become amplitude discontinuities during stitching)
between sample regions. This is a non-trivial problem, because there is
a LARGE number of frequency components which ALL need to be smoothed
(or averaged).

When all is said and done, it is MUCH simpler to use an FIR lowpass filter
before sub-sampling your data. A 128-point filter gives very sharp cutoff,
and on my PC can filter 500,000 samples in a few minutes (16-bit arithmetic).
While such a lowpass filter may LOOK unsuitable (its stopband is only 40 dB
below its passband), in practice it works fine (filter a signal with no
component in the passband, large sinewave in its stopband, sub-sample and
re-FFT and/or play it back - the aliased sinewave is at the noise floor).
Of course, a general FIR routine will be more useful and flexible than the
special multiple-FFT/stitching/smoothing software.

	[An IIR filter might do even better, but I have never made the effort
	to understand or implement such beasts.]


Tom Roberts
att!ihlpl!tjrob