[comp.dsp] Just one more AR/MA/ARMA/Marple-type question

kohli@gemed (Jim Kohli) (01/06/90)

Okay-- I've read the recommended chapters.  I still have
a question which I'm certain is easy for most of you using
AR/MA/ARMA or almost anything from Marple's book.  Stephen
Oh (and Marple) mention in passing that p (or sometimes, ip),
the dimension of the autocorrelation matrix, must be less
than N, the dimension of the 1D dataset.  In fact, Oh suggest
that p<=16, if N=64, and says that "it is known that" this is
true.  Marple says only that p<<N.  What's going on here?

When I learned about autocorrelations, the autocorrelation
vector was either considered to be the same dimension as the
dataset (i.e., p=N), or it was padded with zeroes (p>N).

Can someone point me to a reference which explains the rationale?
Or maybe brain me with a brief yet brilliant explanation?
Is this related to work with "infinite" (i.e., continuously acquired)
datasets?

Theo Smit: if you are reading this, I would appreciate it if you
could send me your EMail address-- the address which shows up in
your postings doesn't work very well-- I'd like to know a little more
about your experiences with ARMA and NMR spectroscopy (thanks!).

Thanks, one and all!

Jim Kohli
GE Medical Systems
We Bring Dog Things To Life
uucp: {your favorite path to uunet}!uunet!crdgw1!gemed!hal!kohli
internet: kohli@hal.med.ge.com

mariano@umigw.MIAMI.EDU (Arthur Mariano) (01/06/90)

In article <1819@mrsvr.UUCP>, kohli@gemed (Jim Kohli) writes:
...
> When I learned about autocorrelations, the autocorrelation
> vector was either considered to be the same dimension as the
> dataset (i.e., p=N), or it was padded with zeroes (p>N).

Dear Jim, This wrong. Real data is noisy. Thus, your estimated
correlations do not equal the true correlations. A good correlation estimate
uses the same data in the numerator as the denominator, viz.
C(k)=sum x(s)*x(s+k)/(sum sqrt(x(s)*x(s+k))), where x is the detrended
(very important) data, k is the lag, * is multiplication and the sum is
over all possible s. A rule of thumb is never calculate your correlation
function for lags greater than 1/4 the data length. The rationale behind
this is that for large lags, very few (relative to zero and small lags)
data points go into the products needed for C(k), e.g. for lags equal to
N-1, only one product can be calculated. Thus large lags have high
estimation error that will corrupt fits to or transforms of your
ESTIMATED correlation function. So keep p small to get best results.
Cheers, Arthur
-- 
Arthur Mariano                     Inet: mariano@umigw.miami.edu [128.116.10.1]
SPAN: miami::arthur (host 3074::)      arthur%miami.span@star.stanford.edu
UUCP: ...!ncar!umigw!mariano               arthur%miami.span@vlsi.jpl.nasa.gov

mariano@umigw.MIAMI.EDU (Arthur Mariano) (01/08/90)

> ...C(k)=sum x(s)*x(s+k)/(sum sqrt(x(s)*x(s+k))),...  OOPS, the formula should
be C(k)=sum x(s)*x(s+k)/(sqrt( sum(x(s)**2) * sum(x(s+k)**2) ) Sorry.
-- 
Arthur Mariano                     Inet: mariano@umigw.miami.edu [128.116.10.1]
SPAN: miami::arthur (host 3074::)      arthur%miami.span@star.stanford.edu
UUCP: ...!ncar!umigw!mariano               arthur%miami.span@vlsi.jpl.nasa.gov

oh@m2.csc.ti.com (Stephen Oh) (01/08/90)

In article <1420@umigw.MIAMI.EDU> mariano@umigw.MIAMI.EDU (Arthur Mariano) writes:
>In article <1819@mrsvr.UUCP>, kohli@gemed (Jim Kohli) writes:
>...
>> When I learned about autocorrelations, the autocorrelation
>> vector was either considered to be the same dimension as the
>> dataset (i.e., p=N), or it was padded with zeroes (p>N).
>
>Dear Jim, This wrong. Real data is noisy. Thus, your estimated
>correlations do not equal the true correlations. A good correlation estimate
>uses the same data in the numerator as the denominator, viz.
>C(k)=sum x(s)*x(s+k)/(sum sqrt(x(s)*x(s+k))), where x is the detrended
>(very important) data, k is the lag, * is multiplication and the sum is
>over all possible s. A rule of thumb is never calculate your correlation
>function for lags greater than 1/4 the data length. The rationale behind
>this is that for large lags, very few (relative to zero and small lags)
>data points go into the products needed for C(k), e.g. for lags equal to
>N-1, only one product can be calculated. Thus large lags have high
>estimation error that will corrupt fits to or transforms of your
>ESTIMATED correlation function. So keep p small to get best results.
>Cheers, Arthur
>-- 
>Arthur Mariano                     Inet: mariano@umigw.miami.edu [128.116.10.1]
>SPAN: miami::arthur (host 3074::)      arthur%miami.span@star.stanford.edu
>UUCP: ...!ncar!umigw!mariano               arthur%miami.span@vlsi.jpl.nasa.gov

Yea, I agree with Arthur.

For AR/ARMA models, it is not always true that the loger order of AR model is
better than the shorter one.

Principle of Parsimony (in estimation sense):
	The larger the number of unknown parameters to be estimated for the
	same number of measurements,
	the lower is the accuracy of the estimate.

But, also note that the shorter one is not better than logner one, either.
There is no solid thoeretical ground, but it is known that if the order (ip) of
AR/ARMA model is not greater than 1/5 or 1/4 of the total observations (N),
generally the longer AR/ARMA model is the better. I should say "generally"
since I cannot promise this claim. In fact, for AR models, if you use Burg's,
Yule-walker's, or Modified Covariance methods, the variance estimate
of the residual process is lesser for the longer order. Apparently, it does not
mean that AR coefficients estimates are more accurate, though because of the
principle of parsimony.
Yea, Yea, I know it is confusing, but like I said, the rule of thumb to
determine the order of AR model is as follows:

1. let ip=(20 or 25 %) * N
2. employing any of three methods: Burg, Y-W, M Cov. (see appendix)
3. compute information statistics such as AIC, MDL, CAT, etc
4. determine the order of AR model based on step 3. Choose the order so as
   to minimize the information criterion values.

Comments? Questions? Post it!!


---------------------  Appendix -------------------------------------

Information Statistics: for the background, see pp.229-231 of Marple's book, or
pp. 234-237 of Kay's book. I am listing six most famous Information creteria
here:

1. FPE (Final Prediction Error)

                        ^
	     FPE(k) =   p   ( N + k + 1 )/( N - k - 1)
                          k

2. AIC (Akeike's Information Criteria)

	                     ^
	    AIC(k) =  N ln ( p ) + 2k
			      k

3. MDL (Minimum Description Length)

	                     ^
	    MDL(k) =  N ln ( p ) + k ln(N)
			      k

4. CAT ( Criterion Autoregressive Tranfer)

	             /   k     _   \            _
	    CAT(k) = |  sum 1/ p   |  / N  - 1/ p
		     \  j=1     j  /             k

5. KIC (Kashyap's Information criteria)

	                             ^
	    KIC(k) =  ( N - k ) ln ( p ) + (k+1) ln( N / (2*pi) )
			              k

5. HIC (Hannan's Information criteria)

	                       ^
	    HIC(k) =   N  ln ( p ) + k ln ln N 
		                k

+----+----+----+----+----+----+----+----+----+----+----+----+----+
|  Stephen Oh         oh@csc.ti.com     |  Texas Instruments     |
|  Speech and Image Understandung Lab.  | Computer Science Center|
+----+----+----+----+----+----+----+----+----+----+----+----+----+