[comp.dsp] FFT vs ARMA

news@ti-csl.csc.ti.com (USENET News System) (11/20/89)

From: oh@m2.csc.ti.com (Stephen Oh)
Path: m2!oh

>No.  If there's significant noise, you want to use ARMA estimators -- the MA
>process zeros are necessary for accurately representing the white noise
>itself.  See, e.g., Steven M. Kay, Modern Spectral Estimation, ISBN
>0-13-598582-X, p. 131 -- he suggests using an ARMA(2,2) model to estimate
>the spectra of a sinusoid (actually an AR(2) process) in white noise.

Hmmm. good point. You are talking about the following system:

x(t) = a1 x(t-1) + a2 x(t-2) + n(t)
y(t) = x(t) + e(t)
where n(t) and e(t) are white noise processes.

Then, y(t) = a1 x(t-1) + a2 x(t-2) + n(t) + e(t)
	   = a1 (y(t-1) - e(t-1)) + a2 (y(t-2) - e(t-2)) + n(t) + e(t)
	   = a1 y(t-1) + a2 y(t-2) + n(t) + e(t) - a1 e(t-1) - a2 e(t-2)

This equation  seems like ARMA (2,2) model  even though it is not exactly 
ARMA(2,2) we want. But, what the heck, n(t) and e(t) are mutually independent
and white noise processes. So, if e(t) is the dominating noise process, the
y(t) seems to obey an ARMA(2,2) process, which is not EXACTLY an AR(2) model.

>No.  Note that ARMA models are no more "statistically stable" than DFTs --
>small variations in the input data may have large effects on the model
>parameters.  See Kay p. 331, Figure 10.6(b) for an example.  

Then what about Kay p.327, Figure 10.5 (b), which is another algorithm with
the same ARMA modeling? But, I agree with this kind of parametric approches 
are sensitive to small variations. As far as I understand, if you choose
higher order for an ARMA model, you are safe from those effects. 

>The reason that
>the FFT is done in segments usually has to do with available input storage
>or desired output resolution, and the reason that the segments are
    ^^^^^^^^^^^^^^^^^^^^^^^^^

What do you mean by desired output resolution by segments?   Case that you want
to have lower resolution?  The achievable resolution by FFT is only 1/N where
N is  the number of samples. That's why we call AR or ARMA approches "high
resolution frequency estimators."

>But then again, it depends on what you want to estimate.  If you "know" that
>your input data consists of two sinusoids in white noise, all the cost
>tradeoffs change, and I wouldn't be surprised to find that the ARMA model is
>cheaper, because you can use a very low-order model (ARMA(4,4)) to get a
>good estimate.  The DFT *is* the ML spectral estimator, in the absence of
>any a priori model whatsoever of the input data, and it's very cheap to
>compute via the FFT.  If you have a data model, the DFT is, as I understand
>it, a poorer choice.

Again, for the resolutions of PSD, parametric approches are *ALOT* better than
FFT-based PSD.

>And again, if you only have 16 data points, and can't obtain more, order
>analysis is really uninteresting, since the size of the implicit constants
>dominates, and besides, neither method takes significant time.  There,
>questions of what exactly it is you're looking for in those 16 data points
>become dominant, and will usually govern your choice of analytic techniques.
>

True. 
+----+----+----+----+----+----+----+----+----+----+----+----+----+
|  Stephen Oh         oh@csc.ti.com     |  Texas Instruments     |
|  Speech and Image Understandung Lab.  | Computer Science Center|
+----+----+----+----+----+----+----+----+----+----+----+----+----+

ashok@atrp.mit.edu (Ashok C. Popat) (11/23/89)

In article <98990@ti-csl.csc.ti.com> oh@m2.UUCP (Stephen Oh) writes:
>
>Again, for the resolutions of PSD, parametric approches are *ALOT* better than
>FFT-based PSD.
>

In applications, you don't always have a good apriori formal model.
Unless you have a formal model that's *useful* for your application,
parametric estimation is worthless.

Suppose I gave you some data (say 10^6 samples) and told you that the
source was ergodic, but nothing else.  How would you estimate the
spectrum?  If you used an ARMA model, how would you decide what the
order of the model should be?  Wouldn't you have much more confidence
in an averaged-periodogram (i.e., DFT-based) estimate?  I would.

Ashok Chhabedia Popat  MIT Rm 36-665  (617) 253-7302

rob@kaa.eng.ohio-state.edu (Rob Carriere) (11/23/89)

In article <1989Nov22.170850.21777@athena.mit.edu>, ashok@atrp.mit.edu (Ashok
C. Popat) writes: 
> In applications, you don't always have a good apriori formal model.
> Unless you have a formal model that's *useful* for your application,
> parametric estimation is worthless.
> 
> Suppose I gave you some data (say 10^6 samples) and told you that the
> source was ergodic, but nothing else.  How would you estimate the
> spectrum?  If you used an ARMA model, how would you decide what the
> order of the model should be?  Wouldn't you have much more confidence
> in an averaged-periodogram (i.e., DFT-based) estimate?  I would.

Nor necessarily.  DFT is quite good at some things, not at others.  If you
give me recorded data that I can play with for a while, I would probably run
FFT, several different periodograms, ARMA or Prony models of several orders
and whatever alse the data made me feel like.  After doing all that, I'd feel
reasonably confident I could tell you something about your data.

If averaged periodograms showed different behavior in different segments of
the data, that means you also want to look at parametric models over subsets
of the data.

In short, if I knew that little no ONE technique would make me happy.  And
finally, the fact that the DFT is non-parametric does not mean that you aren't
making assumptions about the data (in fact, you're assuming periodicity --
something that doesn't always make sense either)

SR
"But the real reason is, I just like to play."

ashok@atrp.mit.edu (Ashok C. Popat) (11/27/89)

In article <3589@quanta.eng.ohio-state.edu> rob@kaa.eng.ohio-state.edu (Rob Carriere) writes:
>In article <1989Nov22.170850.21777@athena.mit.edu>, ashok@atrp.mit.edu (Ashok
>C. Popat) writes: 
>> In applications, you don't always have a good apriori formal model.
>> Unless you have a formal model that's *useful* for your application,
>> parametric estimation is worthless.
>> 
>> Suppose I gave you some data (say 10^6 samples) and told you that the
>> source was ergodic, but nothing else.  How would you estimate the
>> spectrum?  If you used an ARMA model, how would you decide what the
>> order of the model should be?  Wouldn't you have much more confidence
>> in an averaged-periodogram (i.e., DFT-based) estimate?  I would.
>
>Nor necessarily.  DFT is quite good at some things, not at others.  If you
>give me recorded data that I can play with for a while, I would probably run
>FFT, several different periodograms, ARMA or Prony models of several orders
>and whatever alse the data made me feel like.  After doing all that, I'd feel
>reasonably confident I could tell you something about your data.

     Sounds reasonable on the surface --- try a few well known techniques,
     then sort of mentally average the results to conclude something about
     the data.  The problem is that there is absolutely no justification
     for trying some of the techniques.  What you want is a consistent,
     unbiased estimate of the spectrum of an (unknown) ergodic random
     process, given a bunch of samples.  Averaging periodograms (e.g.,
     Welch's method) gives you a consistent, asymptotically unbiased
     estimate.  What a parametric technique gives you depends strongly on
     the assumed model (which isn't given as part of the problem).

>If averaged periodograms showed different behavior in different segments of
>the data, that means you also want to look at parametric models over subsets
>of the data.

     Nope, ergodicity implies stationarity.  You'd have to attribute the
     behavior to chance.

>In short, if I knew that little no ONE technique would make me happy.  And

     Any consistent, unbiased, and efficient estimate should make you
     happy.  An estimate based on unfounded assumptions should not.

>finally, the fact that the DFT is non-parametric does not mean that you aren't
>making assumptions about the data (in fact, you're assuming periodicity --
>something that doesn't always make sense either)

     You are making assumptions, but periodicity isn't one of them.
     Remember, DFT-based spectral estimation *doesn't* mean simply
     computing the DFT of the data.  In fact, it is well known that a
     single periodogram (the magnitude squared of the DFT) is a
     particularly shitty spectral estimate, since it is biased, and since
     its variance doesn't diminish with the amount of data you use (see
     Oppenheim and Schafer, Chapt.  11).  DFT-based spectral estimation
     usually involves some sort of averaging of short, modified
     periodograms.  Now, an argument can be made that this type of
     estimation also assumes something about the data, but the concept is
     subtle.  I suggest Ronald Christiansen's "Entropy Minimax Sourcebook"
     for philosophical/technical discussions of problems in statistical
     inference.

     Ashok Chhabedia Popat  MIT Rm 36-665  (617) 253-7302

oh@m2.csc.ti.com (Stephen Oh) (11/27/89)

In article <1989Nov22.170850.21777@athena.mit.edu> ashok@atrp.mit.edu (Ashok C. Popat) writes:
>In article <98990@ti-csl.csc.ti.com> oh@m2.UUCP (Stephen Oh) writes:
>>
>>Again, for the resolutions of PSD, parametric approches are *ALOT* better than
>>FFT-based PSD.
>>
>
>In applications, you don't always have a good apriori formal model.
>Unless you have a formal model that's *useful* for your application,
>parametric estimation is worthless.
>
>Suppose I gave you some data (say 10^6 samples) and told you that the
>source was ergodic, but nothing else.  How would you estimate the
>spectrum?  If you used an ARMA model, how would you decide what the
>order of the model should be?  Wouldn't you have much more confidence
>in an averaged-periodogram (i.e., DFT-based) estimate?  I would.
>
>Ashok Chhabedia Popat  MIT Rm 36-665  (617) 253-7302


Your assumption is too strong.  You have 10^6 samples with ergodicity?
What if you have 10^6 samples with only wide sense stationary?
What if you have 10^6 smaples with only partially w.s.s?

BTW, I said that parametric approaches are better than FFTs in terms of
resolution. If we have only 100 samples and the separation of two frequencies
is less than 0.01, there is no way to resolve two frequencies using any 
FFT-based method. But AR or ARMA can. :-) :-)
Also, there are several methods to determine the order of the model such as
AIC, MDL, CAT, etc.
+----+----+----+----+----+----+----+----+----+----+----+----+----+
|  Stephen Oh         oh@csc.ti.com     |  Texas Instruments     |
|  Speech and Image Understandung Lab.  | Computer Science Center|
+----+----+----+----+----+----+----+----+----+----+----+----+----+

ashok@atrp.mit.edu (Ashok C. Popat) (11/29/89)

In article <99691@ti-csl.csc.ti.com> Stephen Oh writes:

>In article <1989Nov22.170850.21777@athena.mit.edu> ashok@atrp.mit.edu (Ashok C. Popat) writes:
>>Suppose I gave you some data (say 10^6 samples) and told you that the
>>source was ergodic, but nothing else.  How would you estimate the
>>spectrum?  If you used an ARMA model, how would you decide what the
>>order of the model should be?  Wouldn't you have much more confidence
>>in an averaged-periodogram (i.e., DFT-based) estimate?  I would.
>
>Your assumption is too strong.  You have 10^6 samples with ergodicity?
>What if you have 10^6 samples with only wide sense stationary?
>What if you have 10^6 smaples with only partially w.s.s?

I'm not exactly sure what you mean by "too strong" --- it's a "given"
in the problem.  Are you saying that in many applications, waveforms
cannot be usefully modeled as ergodic?  If so, I'll buy that.  I guess
I shouldn't have used "ergodic," since that lumps too many assumptions
together.  How about agreeing that any piecewise stationary process we
discuss is ergodic over each stationary piece (if I hadn't brought up
ergodicity in the first place, this would not have been worth
mentioning, since we'd have to assume piecewise ergodicity to infer
anything it all).  That leaves the issue of stationarity.  Now from
what I remember of stochastic processes, wide-sense and strict-sense
stationarity are the same if you're dealing exclusively with
second-order statistics (e.g., power spectrum).  I guess then I could
have described my hypothetic source as being WSS over 10^6 samples.  A
poor model for speech and images, but realistic in other applications.

>BTW, I said that parametric approaches are better than FFTs in terms of
>resolution. If we have only 100 samples and the separation of two frequencies
>is less than 0.01, there is no way to resolve two frequencies using any 
>FFT-based method. But AR or ARMA can. :-) :-)

Good point.  I thought about this and here's what I came up with.  The
duration-bandwidth uncertainty principle says (for continuous-time
waveforms) that
                 delta_t*delta_f >= 1/pi
where delta_t is the time window size and delta_f is the frequency
resolution (see William Siebert, _Circuits, Signals, and Systems_).
I'm sure a similar result applies in the discrete-time case, but I
don't have a reference off hand --- I'll assume it has the same form.
Now if you're starting with only 100 samples, the uncertainty
principle says that there's simply not enough information in the data
to get a high-resolution spectrum.  If you do manage to get a
high-resolution spectrum, the necessary added information must have
come from the model, not the data.  What do you think?

>Also, there are several methods to determine the order of the model such as
>AIC, MDL, CAT, etc.

Any recommended reading on these techniques?

Ashok Chhabedia Popat  MIT Rm 36-665  (617) 253-7302

rob@kaa.eng.ohio-state.edu (Rob Carriere) (11/29/89)

In article <1989Nov26.194904.1376@athena.mit.edu> ashok@atrp.mit.edu (Ashok C.
Popat) writes: 
>In article <3589@quanta.eng.ohio-state.edu> rob@kaa.eng.ohio-state.edu (Rob
Carriere) writes: 
>>In article <1989Nov22.170850.21777@athena.mit.edu>, ashok@atrp.mit.edu (Ashok
>>C. Popat) writes: 
>>> Unless you have a formal model that's *useful* for your application,
>>> parametric estimation is worthless.
>>> Suppose I gave you some data (say 10^6 samples) and told you that the
>>> source was ergodic, but nothing else.  
>>Nor necessarily.  DFT is quite good at some things, not at others.  If you
>>give me recorded data that I can play with for a while, I would probably run
>>FFT, several different periodograms, ARMA or Prony models of several orders
>>and whatever alse the data made me feel like.  After doing all that, I'd feel
>>reasonably confident I could tell you something about your data.
>
>     Sounds reasonable on the surface --- try a few well known techniques,
>     then sort of mentally average the results to conclude something about
>     the data.  The problem is that there is absolutely no justification
>     for trying some of the techniques.  What you want is a consistent,
>     unbiased estimate of the spectrum of an (unknown) ergodic random
>     process, given a bunch of samples.  Averaging periodograms (e.g.,
>     Welch's method) gives you a consistent, asymptotically unbiased
>     estimate.  What a parametric technique gives you depends strongly on
>     the assumed model (which isn't given as part of the problem).

Well, that's good.  It seems the surface agrees with the inside here :-)
What I want is something that gives me a good idea of what is going on.
Depending on the circumstances, consistent unbiased may or may not cut it as a
good idea.  The standard counterexample is to try ML on some data with two
closely spaced spectral peaks.  It is not at all hard to set the stage so that
ML will miserably fail to separate the peaks.  If my interest was primarily in
the number of spectral peaks present, as it is in some applications, then it
is going to be a small consolation indeed to know that at least variance has
been minimized.

If you are saying that we should have more knowledge of what parametric
methods do when the model doesn't fit reality, I entirely agree.  There is a
body of knowledge, but it is entirely empirical and ad-hoc.

>>If averaged periodograms showed different behavior in different segments of
>>the data, that means you also want to look at parametric models over subsets
>>of the data.
>
>     Nope, ergodicity implies stationarity.  You'd have to attribute the
>     behavior to chance.

More probably, I'd attribute it to an umwaranted assumption of ergodicity.  I
don't know how these things are done elsewhere, but I've seen too many cases
where ergodicity or even stationarity was assumed just because.  If I saw
clear trends between segments of the data, I'd be _very_ unlikely to attribute
it to chance.

>>In short, if I knew that little no ONE technique would make me happy.  And
>
>     Any consistent, unbiased, and efficient estimate should make you
>     happy.  An estimate based on unfounded assumptions should not.

If minimal variance is what I'm after, yes.  However, all these things tend to
have the word "asymptoticaly" before all the good stuff.  All too often that
means "whenever you have about 10 times more data."

Consider also that since speed of convergence typically depends on the
(unknown) characteristics of the data, there are "unfounded assumptions" no
matter where you turn.

>>finally, the fact that the DFT is non-parametric does not mean that you
  aren't 
>>making assumptions about the data (in fact, you're assuming periodicity --
>>something that doesn't always make sense either)
>
>     You are making assumptions, but periodicity isn't one of them.
>     Remember, DFT-based spectral estimation *doesn't* mean simply
>     computing the DFT of the data.  In fact, it is well known that a

Yes.  I goofed.  Apologies for spreading disinformation.

SR

oh@m2.csc.ti.com (Stephen Oh) (11/30/89)

In article <1989Nov28.185555.4259@athena.mit.edu> ashok@atrp.mit.edu (Ashok C. Popat) writes:


>I'm not exactly sure what you mean by "too strong" --- it's a "given"
>in the problem.  Are you saying that in many applications, waveforms
>cannot be usefully modeled as ergodic?  

Yes.

>I guess then I could
>have described my hypothetic source as being WSS over 10^6 samples.  A
>poor model for speech and images, but realistic in other applications.

Sure you can do that.
But, still I wonder whether 10^6 wss samples are good or not
since 10^6 samples are too big. Don't you think?
When I said partially wss, for some portions of data: it is wss
			   but for whole data: it is not.


>Good point.  I thought about this and here's what I came up with.  The
>duration-bandwidth uncertainty principle says (for continuous-time
>waveforms) that
>                 delta_t*delta_f >= 1/pi
>where delta_t is the time window size and delta_f is the frequency
>resolution (see William Siebert, _Circuits, Signals, and Systems_).
>I'm sure a similar result applies in the discrete-time case, but I
>don't have a reference off hand --- I'll assume it has the same form.

The resolution = 1/N where N is the number of samples.

>Now if you're starting with only 100 samples, the uncertainty
>principle says that there's simply not enough information in the data
>to get a high-resolution spectrum.  If you do manage to get a
>high-resolution spectrum, the necessary added information must have
>come from the model, not the data.  What do you think?

I don't know why you brought up the uncertainty principle, but is there
any measure that 100 samples are not enough to get a high resolution 
estimate? (This is not a flame, I just want to know)
Your statement is true though: From Kay's book (ISBN 0-13-598582-X),
     "Windowing of data or ACF values makes the implicit assumption that
      the unobserved data or ACF values outside the window are zero, 
      which is nomally an unrealistic assumption. A smeared spectral
      estimate is a consequence of the windowing. Often, we have more
      knowledge about the process from which the data samples are taken,
      or at least we are able to make a more reasonable assumption other
      than to assume the data of ACF values are zero outside the window."

And I agree with Kay.

						
>Any recommended reading on these techniques?

1. H. Akaike, "A New Look at Statistical Model Identification," IEEE
	      Trans. Automat. Contr., Vol. AC-19, 1974
2. E. J. Hannan, "The Estimation of the Order of an ARMA Process,"
	      Ann. Statist., Vol.8 1980.
3. R. L. Kashyap, "Optimal Choice of AR and MA parts in Autoregressive
	     Moving Average Models," IEEE Trans. Pattern Anal. Mach.
	     Intell., Vol PAMI-4, 1982
4. L. Marple, "Digital Spectral Analysis with Applications,"
	      Prentice-Hall,  1987
5. S. Kay, "Modern Spectral Estimation," Prentice-Hall, 1988

+----+----+----+----+----+----+----+----+----+----+----+----+----+
|  Stephen Oh         oh@csc.ti.com     |  Texas Instruments     |
|  Speech and Image Understandung Lab.  | Computer Science Center|
+----+----+----+----+----+----+----+----+----+----+----+----+----+