kohli@gemed (Jim Kohli) (01/06/90)
Okay-- I've read the recommended chapters. I still have a question which I'm certain is easy for most of you using AR/MA/ARMA or almost anything from Marple's book. Stephen Oh (and Marple) mention in passing that p (or sometimes, ip), the dimension of the autocorrelation matrix, must be less than N, the dimension of the 1D dataset. In fact, Oh suggest that p<=16, if N=64, and says that "it is known that" this is true. Marple says only that p<<N. What's going on here? When I learned about autocorrelations, the autocorrelation vector was either considered to be the same dimension as the dataset (i.e., p=N), or it was padded with zeroes (p>N). Can someone point me to a reference which explains the rationale? Or maybe brain me with a brief yet brilliant explanation? Is this related to work with "infinite" (i.e., continuously acquired) datasets? Theo Smit: if you are reading this, I would appreciate it if you could send me your EMail address-- the address which shows up in your postings doesn't work very well-- I'd like to know a little more about your experiences with ARMA and NMR spectroscopy (thanks!). Thanks, one and all! Jim Kohli GE Medical Systems We Bring Dog Things To Life uucp: {your favorite path to uunet}!uunet!crdgw1!gemed!hal!kohli internet: kohli@hal.med.ge.com
mariano@umigw.MIAMI.EDU (Arthur Mariano) (01/06/90)
In article <1819@mrsvr.UUCP>, kohli@gemed (Jim Kohli) writes: ... > When I learned about autocorrelations, the autocorrelation > vector was either considered to be the same dimension as the > dataset (i.e., p=N), or it was padded with zeroes (p>N). Dear Jim, This wrong. Real data is noisy. Thus, your estimated correlations do not equal the true correlations. A good correlation estimate uses the same data in the numerator as the denominator, viz. C(k)=sum x(s)*x(s+k)/(sum sqrt(x(s)*x(s+k))), where x is the detrended (very important) data, k is the lag, * is multiplication and the sum is over all possible s. A rule of thumb is never calculate your correlation function for lags greater than 1/4 the data length. The rationale behind this is that for large lags, very few (relative to zero and small lags) data points go into the products needed for C(k), e.g. for lags equal to N-1, only one product can be calculated. Thus large lags have high estimation error that will corrupt fits to or transforms of your ESTIMATED correlation function. So keep p small to get best results. Cheers, Arthur -- Arthur Mariano Inet: mariano@umigw.miami.edu [128.116.10.1] SPAN: miami::arthur (host 3074::) arthur%miami.span@star.stanford.edu UUCP: ...!ncar!umigw!mariano arthur%miami.span@vlsi.jpl.nasa.gov
mariano@umigw.MIAMI.EDU (Arthur Mariano) (01/08/90)
> ...C(k)=sum x(s)*x(s+k)/(sum sqrt(x(s)*x(s+k))),... OOPS, the formula should
be C(k)=sum x(s)*x(s+k)/(sqrt( sum(x(s)**2) * sum(x(s+k)**2) ) Sorry.
--
Arthur Mariano Inet: mariano@umigw.miami.edu [128.116.10.1]
SPAN: miami::arthur (host 3074::) arthur%miami.span@star.stanford.edu
UUCP: ...!ncar!umigw!mariano arthur%miami.span@vlsi.jpl.nasa.gov
oh@m2.csc.ti.com (Stephen Oh) (01/08/90)
In article <1420@umigw.MIAMI.EDU> mariano@umigw.MIAMI.EDU (Arthur Mariano) writes: >In article <1819@mrsvr.UUCP>, kohli@gemed (Jim Kohli) writes: >... >> When I learned about autocorrelations, the autocorrelation >> vector was either considered to be the same dimension as the >> dataset (i.e., p=N), or it was padded with zeroes (p>N). > >Dear Jim, This wrong. Real data is noisy. Thus, your estimated >correlations do not equal the true correlations. A good correlation estimate >uses the same data in the numerator as the denominator, viz. >C(k)=sum x(s)*x(s+k)/(sum sqrt(x(s)*x(s+k))), where x is the detrended >(very important) data, k is the lag, * is multiplication and the sum is >over all possible s. A rule of thumb is never calculate your correlation >function for lags greater than 1/4 the data length. The rationale behind >this is that for large lags, very few (relative to zero and small lags) >data points go into the products needed for C(k), e.g. for lags equal to >N-1, only one product can be calculated. Thus large lags have high >estimation error that will corrupt fits to or transforms of your >ESTIMATED correlation function. So keep p small to get best results. >Cheers, Arthur >-- >Arthur Mariano Inet: mariano@umigw.miami.edu [128.116.10.1] >SPAN: miami::arthur (host 3074::) arthur%miami.span@star.stanford.edu >UUCP: ...!ncar!umigw!mariano arthur%miami.span@vlsi.jpl.nasa.gov Yea, I agree with Arthur. For AR/ARMA models, it is not always true that the loger order of AR model is better than the shorter one. Principle of Parsimony (in estimation sense): The larger the number of unknown parameters to be estimated for the same number of measurements, the lower is the accuracy of the estimate. But, also note that the shorter one is not better than logner one, either. There is no solid thoeretical ground, but it is known that if the order (ip) of AR/ARMA model is not greater than 1/5 or 1/4 of the total observations (N), generally the longer AR/ARMA model is the better. I should say "generally" since I cannot promise this claim. In fact, for AR models, if you use Burg's, Yule-walker's, or Modified Covariance methods, the variance estimate of the residual process is lesser for the longer order. Apparently, it does not mean that AR coefficients estimates are more accurate, though because of the principle of parsimony. Yea, Yea, I know it is confusing, but like I said, the rule of thumb to determine the order of AR model is as follows: 1. let ip=(20 or 25 %) * N 2. employing any of three methods: Burg, Y-W, M Cov. (see appendix) 3. compute information statistics such as AIC, MDL, CAT, etc 4. determine the order of AR model based on step 3. Choose the order so as to minimize the information criterion values. Comments? Questions? Post it!! --------------------- Appendix ------------------------------------- Information Statistics: for the background, see pp.229-231 of Marple's book, or pp. 234-237 of Kay's book. I am listing six most famous Information creteria here: 1. FPE (Final Prediction Error) ^ FPE(k) = p ( N + k + 1 )/( N - k - 1) k 2. AIC (Akeike's Information Criteria) ^ AIC(k) = N ln ( p ) + 2k k 3. MDL (Minimum Description Length) ^ MDL(k) = N ln ( p ) + k ln(N) k 4. CAT ( Criterion Autoregressive Tranfer) / k _ \ _ CAT(k) = | sum 1/ p | / N - 1/ p \ j=1 j / k 5. KIC (Kashyap's Information criteria) ^ KIC(k) = ( N - k ) ln ( p ) + (k+1) ln( N / (2*pi) ) k 5. HIC (Hannan's Information criteria) ^ HIC(k) = N ln ( p ) + k ln ln N k +----+----+----+----+----+----+----+----+----+----+----+----+----+ | Stephen Oh oh@csc.ti.com | Texas Instruments | | Speech and Image Understandung Lab. | Computer Science Center| +----+----+----+----+----+----+----+----+----+----+----+----+----+