CALENDAR@IBM-SJ.ARPA (03/07/86)
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120-6099
CALENDAR
March 10 - 14, 1986
Computer STOCHASTIC COMPLEXITY AND THE MDL AND PMDL PRINCIPLES
Science J. J. Rissanen, IBM Almaden Research Center
Colloquium
Thurs., Mar. 13 There is no rational basis in traditional
3:00 P.M. statistics for the comparison of two models
Rear Audit. unless they have the same number of parameters.
Hence, for example, the important
selection-of-variables problem has a dozen or so
solutions, none of which can be
preferred over the others. Recently, inspired
by the algorithmic notion of complexity, we
introduced a new concept in statistics, the
Stochastic Complexity of the observed data,
relative to a class of proposed probabilistic
models. In broad terms, it is defined as the
least number of binary digits with which the
data can be encoded by use of the selected
models. The stochastic complexity also
represents the smallest prediction errors which
result when the data are predicted by use of the
models. Accordingly, the associated optimal
model represents all the statistical information
in the data that can be extracted with the
proposed models, and for this reason its
computation, which we call the MDL (Minimum
Description Length) principle, may be taken to
be the fundamental problem in statistics. In
this talk, we describe a special form of the MDL
principle, which amounts to the minimization of
squared "honest" prediction errors, and we apply
it to two examples of polynomial curve fitting
as well as to contingency tables. In the first
example, which calls for the prediction of
weight growth of mice, the degree of the MDL
polynomial agrees with the optimal degree,
determined in retrospect after the predicted
weights were seen. The associated predictions
also far surpass those made with the best
traditional statistical techniques. A
fundamental theorem is given, which permits
comparison of models in the spirit of the
Cramer-Rao inequality, except that the models
need not have the same number of parameters. It
also settles the issue of how the
selection-of-variables problem is to be solved.
Host: R. Arps
(Refreshments at 2:45 P.M.)
[...]