PHW@OZ.AI.MIT.EDU ("Patrick H. Winston") (04/23/87)
**************** OPPORTUNITY FOR PARTICIPATION **************** WORKSHOP ON THE MATRIX OF BIOLOGICAL INFORMATION ARTIFICIAL INTELLIGENCE, DATA BANK MANAGEMENT, COMPUTER ANALYSIS OF MACROMOLECULES --- APPLIED TO CELLULAR BIOLOGY TO DEVELOP AN APPROACH TO GENERALIZATIONS AND OTHER THEORETICAL INSIGHTS IN BIOLOGICAL SCIENCE. We have today a unique opportunity to merge research at the forefront of Artificial Intelligence with efforts to provide a new conceptual framework for the laws, models, empirical generalizations and physical foundations of the modern biological sciences. The Matrix of Biological Knowledge is an attempt to use advanced computer methods to organize the immense and growing body of experimental data in the biological sciences, in the expectation that there are a significant number of as yet undiscovered ordering relations, new laws and predictive relations embedded in the mass of existing information. Workshop participants will attempt to define the interrelations of the matrix of biological knowledge, and to demonstrate its feasibility by applying the modern tools of computer science to a small set of case studies. This is an outgrowth of a report from the Natl. Academy of Sciences, "Models for Biomedical Research: a New Perspective," produced in response to a request by the Natl. Institutes of Health (NIH). A brief summary and description appears in "An Omnifarious Data Bank for Biology?," SCIENCE 228(4706), 21 June 1985. The workshop is intended to introduce a number of young scientists to the matrix concept and to explore with these investigators the possibilities of new theoretical developments and conceptual frameworks. The workshop will run July 13 - August 14 at St. Johns College in Santa Fe, in the Sangre de Cristo mountains of northern New Mexico (AAAI attendees may miss the first week). Participants will be supported with housing, meals and travel as necessary. Thirty participants (graduate students, post-doctoral fellows, and working scientists) are expected to be selected by application from throughout the United States. Eight groups will be directed by senior scientists: "Artificial Intelligence," Patrick Winston, A.I. Laboratory, MIT; "Management of Large Scale Data Bases," Robert Goldstein, U. Brit. Columbia; "Computers Applied to Macromolecules," Peter Kollman, U. Cal. San Francisco; "The Organization of Biological Knowledge," Harold Morowitz, Yale University; "Cell-Cell Interactions," Hans Bode, U. of Calif., Irvine; "Toxicology," Robert Rubin, Johns Hopkins University; "Information Flow from DNA to Cells," Richard Dickerson, UCLA, Harvey Hershman, UCLA, and Temple Smith, Harvard University; "Peptides and Signalling Molecules," Christian Burks, Los Alamos Natl. Lab., and Derek LeRoith, NIH. A brief description of background and desire to participate, together with two letters of recommendation, should be sent to Santa Fe Institute, attn. Ginger Richardson P.O. Box 9020 Santa Fe, New Mexico, 87504 - 9020 (phone (505) 984-8800) (Applicants should first review the NAS report or the SCIENCE article, above, available in most science libraries.) The workshop has been previously announced in other forums and the formal application deadline is 1 May 1987. Applicants who will have difficulty meeting that deadline should telephone Ginger Richardson and notify her of their intent to submit an application, as few if any positions will be available after that date. Applicants are strongly encouraged to apply expeditiously so that an early decision about participation may be reached. Some representative connections between Artificial Intelligence and the Matrix Workshop follow, but the list is suggestive only. NATURAL LANGUAGE: What constraints on form and content must be met for a scientific Abstract to be machine-readable? It is generally a single paragraph in a very restricted form of declarative prose. If tolerable constraints could be found they would probably be widely adopted. KNOWLEDGE REPRESENTATION: How much of what knowledge must be captured, and how, to enable scientific reasoning? Is a single unified representation scheme possible or must each sub-field have a specialized representation to support a specialized vocabulary and ontology? ``In the Knowledge lies the Power.'' How can we organize this tremendous amount of knowledge to extract the power everyone believes is there? ANALOGICAL MAPPING: How can we notice when analogous biological functions are implemented by analogous structures? Can we discover and validate analogical animal models of human systems? Can we explain an unknown response in an organism by analogy to a better-understood system? Given an experimental system, description or outcome, could we index and retrieve analogous situations and/or literature references? MACHINE LEARNING: How can we re-structure the large existing databases to automate induction from data? Can we use more knowledge-intensive forms of learning in this knowledge-intensive domain? Can existing learning paradigms be extended to cope with the noisy data that any real application must face? RULE-BASED EXPERT SYSTEMS: How much of the expert scientist's knowledge can be formalized explicitly as rules? Could we produce an expert system which, given a problem or request for information, could infer which database contained the answer? Could expert knowledge, say of toxicology, be used to produce a Toxicology Advisor which knew how to access databases to find answers to questions not covered by its rules? Could we create expert systems which continually scanned new additions to databases to update their rules, or at least flag areas where the new addition conflicts with or supplants an existing rules? TRUTH MAINTENANCE: Suppose an Abstract always contained an explicit statement of the proposition(s) argued for or against by the paper. Could this be entered into a dependency network, with the paper as justification? Could we then query the TMS to determine, for some proposition, whether it is generally believed, disbelieved, or controversial; and pick out the relevant literature citations? If a new paper supports or contradicts a result from a neighboring field, can this be detected reliably? QUALITATIVE PROCESS THEORY: Can an organism be modeled as a cooperating system of processes? Can we organize this so as to find similar process systems shared by different organisms? Can we reliably predict the effects of perturbing an organism's processes, e.g. in the study of toxicology or medicine? SCIENTIFIC REASONING AND DISCOVERY: We have the opportunity to structure a large, continuously-updated body of real-world scientific knowledge. What form of Knowledge Base would best facilitate discovering the unexpected regularities in the data? Could a program (possibly using a dependency network of experimental results) suggest crucial experiments and reason about implications of possible outcomes? SCHEMA COMPLETION: Can an experiment be understood in terms of a setting which instantiates an ``experiment schema''? Can we use this to group results that are ``schematically close'', even if they occur in different biological models or in related but distinct sub-fields? Can we fill in the default assumptions underlying a description of the experiment and results? DISCOURSE/STORY UNDERSTANDING: Could a scientific article be analyzed as a narrative describing an experimental setting, a group of observations, and some conclusions? Given a new story (experiment), could we retrieve closely related or similar stories we've heard before? Could a highly abridged summary of the story be produced? Could several stories be automatically merged, and an overall summary produced? This list is obviously indicative, not exhaustive. -------