LOHMAN%ibm-sj.csnet@csnet-relay.arpa (03/28/84)
From: Guy M. Lohman <LOHMAN%ibm-sj.csnet@csnet-relay.arpa> [Forwarded from the SRI-AI bboard by Laws@SRI-AI.] IBM San Jose Research Lab 5600 Cottle Road San Jose, CA 95193 Thurs., April 5 Computer Science Colloquium 3:00 P.M. MINIMUM DESCRIPTION LENGTH PRINCIPLE IN MODELING Auditorium Traditionally, statistical estimation and modeling involve besides certain well established procedures, such as the celebrated maximum likelihood technique, a substantial amount of judgment. The latter is typically needed in deciding upon the right model complexity. In this talk we present a recently developed principle for modeling and statistical inference, which to a considerable extent allows reduction of the judgment portion in estimation. This so-called MDL-principle is based on a purely information theoretic idea. It selects that model in a parametric class which permits the shortest coding of the data. The coding, of which we only need the length in terms of, say, binary digits, must, however, be self-containing in the sense that the description of the parameters themselves needed in the imagined encoding are included. For this reason, the optimum model cannot possibly be very complex unless the data sample is very large. A fundamental theorem gives an asymptotically valid formula for the shortest possible code length as well as for the optimum model complexity in a large class of models. For short samples no simple formula exists, but the optimum complexity can be estimated numerically and taken advantage of. Finally, the principle is generalized so as to allow any measure for a model's performance such as its ability to predict. J. Rissanen, San Jose Research Host: P. Mantey Fri., April 6 Computer Science Seminars Auditorium KNOWLEDGE AND DATABASES (11:15) We define a knowledge based approach to database problems. Using a classification of application from the enterprise to the system level we can give examples of the variety of knowledge which can be used. Most of the examples are drawn from work at the KBMS Project in Stanford. The objective of the presentation is to illustrate the power but also the high payoff of quite straightforward artificial intelligence applications in databases. Implementation choices will also be evaluated. G. Wiederhold, Stanford University Host: J. Halpern --------------------------------------------------------------- Visitors, please arrive 15 mins. early. IBM is located on U.S. 101 7 miles south of Interstate 280. Exit at Ford Road and follow the signs for Cottle Road. The Research Laboratory is IBM Building 028. For more detailed directions, please phone the Research Lab receptionist at (408) 256-3028. For further information on individual talks, please phone the host listed above. IBM San Jose Research mails out both the complete research calendar and a computer science subset calendar. Send requests for inclusion in either mailing list to CALENDAR.IBM-SJ at RAND-RELAY.