norm@bert.HYWLMSD.WA.COM (Norm Lehtomaki) (06/20/90)
It is well-known that when doing power spectrum estimation the data set should be broken in to a series of shorter segments which are used to compute periodograms which are then averaged or spectrally smoothed with an appropriate window function to produce a consistent spectral estimate. If the data is not sectioned and averaged, then the spectral estimate variance does not decrease as the number of data points becomes large. So here's one of my questions. Is there an optimal partitioning of data into segments to do spectral analysis? Optimal in the sense of minimizing some quadratic like criteria with all the usual assumptions about white Gaussian additive noise. (The answer to this I seem to recall is that this is the classical tradeoff between resolution and stability but something bothers me about that answer in that isn't the tradeoff between resolution and stability covered in the optimization criteria specified?) This question was motivated by considering the simple problem of computing a Maximum Likelihood estimate of an unknown deterministic parameter a. Suppose we make measurements y where y = H a + n with y, H and n being column vectors. The covariance of the noise n is a scalar multiple of the identity and n has a Gaussian distribution. This is just a sampled version of the equation y(t) = a * sin(w*t) + n(t) or in complex form y(t) = a * exp(j*w*t) + n(t) The ML estimate of a, ahat, is also the least squares solution for a which is given by H'y ahat = ----- . H'H Now H'H = n is simply a constant equal to the number of components in y. This makes ahat (in the complex case) essentially the DFT of y at frequency w except for the scale factor of 1/n. Now suppose I am trying to estimate the power in the sinusoid at frequency w which is simply |a|^2 in the complex case or a*a/2 in the real case. The point is that the ML estimate of the power is ahat^2 because the probability density function parameterized by p(y;sqrt(b)) instead of p(y;a) where b=a^2 requires that bhat = ahat^2. This says there is no sectioning of the data. All I need do is take one DFT at the proper frequency and square its amplitude. This seems to be a contradiction because I know unless I section the data the variance does not decrease as I take longer and longer DFT's. Is it because this estimate of the power is biased? Is it because I have assumed the frequency to be known? (I don't think so.) If the frequency is unknown we simply use the ML estimate of frequency in constructing H and use the same formula for ahat. Is it because this problem is fundamentally different than power spectrum estimation? I feel like I'm overlooking something rather fundamental and basic and will feel rather silly as well as grateful when someone corrects me. Could it be that there is no optimal partitioning of data? It seems like the ML approachs says give me a list of numbers and how they are related to the parameter you want to estimate and I'll give you the optimal rule for computing the estimate. Shouldn't this include the sectioning of the data in the rule? Thanks in advance, Norm -- Norm Lehtomaki ...!{uw-beaver!tikal,uunet}!nwnexus!bert!norm Honeywell Marine Systems Division norm@bert.HYWLMSD.WA.COM 6500 Harbour Heights Parkway M/S 4E13 Everett, WA 98204-8899 Phone (206)356-3904