[comp.theory] Average case compl. = Worst case compl. under the universal

paulv@cwi.NL (Paul Vitanyi) (07/18/90)
The following result on average case  complexity,  contained
in  a `learning' paper by Ming Li and myself, FOCS89, 34-39,
may be of interest to a wider audience.

Theorem:  Average  case   complexity   (like   time,   space
complexity)  under  the universal distribution on the inputs
equals THETA (worst case complexity).

Example: the average running time for  Quicksort  under  the
universal distribution is THETA (n^2).

Comments: For many algorithms the average case running  time
under  some  distributions  on  the  inputs is less than the
worst-case running  time.  For  Quicksort  and  the  uniform
distribution the former is n log n, while the latter is n^2.
Clearly, for each algorithm we can find  a  distribution  on
the  inputs  such  that the average running time is equal to
the worst case running time (concentrate the probability  on
the  worst  cases).   The question arises, if for some fixed
natural distribution there are problems which  have  average
time  equal to worst case time.  L.A. Levin (1985) has shown
that there are NP-complete problems (like tiling) such  that
the   average   time   under  the  uniform  distribution  is
polynomial iff P = NP. The above theorem shows that there is
one  fixed  distribution,  the  universal one, such that the
average case time complexity  equals  the  worst  case  time
complexity (up to a constant factor) for *all* problems.


Background:  The  universal  distribution  (R.J.  Solomonoff
1964,  L.A.  Levin 1970, 1974) is m(x) = 2^-K(x), where K(x)
is the prefix variant of Kolmogorov complexity. (The  length
of  the  shortest program from which the reference universal
TM computes x --- the set of programs of this machine  being
prefix-free.)  Thus, a string x = 11..1 of length n has K(x)
<= log n + 2 log log n and m(x) >= 1/(n log^2 n).  a  string
y  of  length  n  generated  by  fair  coin  tosses has with
overwhelming probability  K(y)  >=  n  and  m(x)  <=  1/2^n.
(Everything  up to fixed constant terms/factors.) Thus, m(x)
embodies Occam's razor  by  assigning  high  probability  to
simple objects and low probability to complex objects.

Is the universal distribution natural to consider? One would
think  yes.   It  dominates  all computable distributions P:
m(x) >= c P(x) for a fixed constant c depending  only  on  P
(c=2-K(P))  by a result of Levin. This implies that, for any
computable distribution  P(x),  the  P-probability  of  P(x)
being close to m(x) is high:

sum $P(x): k m (x) >=  P(x) >= m (x)/k >= 1 - 1/k,

for all k>=K(P). These statements actually hold for an  even
wider  class of distributions which can be approximated from
below by recursive functions: the enumerable distributions.

All distributions which have a name, or have  been  used  in
statistics,  appear  to  be enumerable.  In absence of any a
priori knowledge of the actual distribution therefore, apart
from  that  it  is enumerable, studying the average behavior
under m may be more meaningful  than  studying  the  average
behavior under any other particular enumerable distribution.
By the above result this average is the worst case.

Copies of the full paper, mercifully short, can be  obtained
upon  request  at the above email address. Since K(x) is not
computable, the question arises about feasible analogues  to
the  above  result. This has since been studied by Peter Bro
Milterson, The complexity of difficult measures, Manuscript,
DAIMI, Univ. Aarhus, May 1990.