[sci.math] Power Spectrum Estimation Optimal Data Sectioning?

norm@bert.HYWLMSD.WA.COM (Norm Lehtomaki) (06/20/90)
It is well-known that when doing power spectrum estimation the data 
set should be broken in to a series of shorter segments which are 
used to compute periodograms which are then averaged or spectrally 
smoothed with an appropriate window function to produce a consistent 
spectral estimate.  If the data is not sectioned and averaged, then the
spectral estimate variance does not decrease as the number of data 
points becomes large.  

So here's one of my questions.

Is there an optimal partitioning of data into segments to do
spectral analysis? 

Optimal in the sense of minimizing some quadratic like criteria 
with all the usual assumptions about white Gaussian additive noise.
(The answer to this I seem to recall is that this is the classical
tradeoff between resolution and stability but something bothers
me about that answer in that isn't the tradeoff between
resolution and stability covered in the optimization criteria
specified?)

This question was motivated by considering the simple problem 
of computing a Maximum Likelihood estimate of an unknown deterministic 
parameter a.  Suppose we make measurements y where

y = H a + n

with y, H and n being column vectors. The covariance of the noise
n is a scalar multiple of the identity and n has a Gaussian 
distribution.  This is just a sampled version of the equation

y(t) = a * sin(w*t) + n(t)

or in complex form

y(t) = a * exp(j*w*t) + n(t)

The ML estimate of a, ahat, is also the least squares solution
for a which is given by

        H'y 
ahat = -----  .
        H'H


Now H'H = n is simply a constant equal to the number of components in y.
This makes ahat (in the complex case) essentially the DFT of y
at frequency w except for the scale factor of 1/n.
Now suppose I am trying to estimate the power in the sinusoid at
frequency w which is simply |a|^2 in the complex case or
a*a/2 in the real case.
The point is that the ML estimate of the power is ahat^2 because
the probability density function parameterized by p(y;sqrt(b)) 
instead of p(y;a) where b=a^2 requires that bhat = ahat^2.
This says there is no sectioning of the data. All I need do
is take one DFT at the proper frequency and square its amplitude.
This seems to be a contradiction because I know unless I section
the data the variance does not decrease as I take longer and longer
DFT's.

Is it because this estimate of the power is biased?
Is it because I have assumed the frequency to be known? 
(I don't think so.)
If the frequency is unknown we simply use the ML estimate
of frequency in constructing H and use the same formula for
ahat. Is it because this problem is fundamentally different
than power spectrum estimation? I feel like I'm overlooking
something rather fundamental and basic and will feel rather silly
as well as grateful when someone corrects me.
Could it be that there is no optimal partitioning of data?
It seems like the ML approachs says give me a list of numbers
and how they are related to the parameter you want to estimate
and I'll give you the optimal rule for computing the estimate.
Shouldn't this include the sectioning of the data in the rule?


Thanks in advance,

Norm


-- 
Norm Lehtomaki         ...!{uw-beaver!tikal,uunet}!nwnexus!bert!norm
Honeywell Marine Systems Division           norm@bert.HYWLMSD.WA.COM
6500 Harbour Heights Parkway M/S 4E13
Everett, WA  98204-8899                            Phone (206)356-3904