WALKER@SUMEX-AIM.STANFORD.EDU (Michael Walker) (03/29/88)
The ARTISYST Workshop (ARTificial Intelligence for SYSTematics) Uses of Artificial Intelligence and modern computer methods for systematic studies in biology. Contact: Renaud Fortuner at (916) 445-4521 or through BITNET or ARPANET at rfortuner@ucdavis. An organization committee with R. Fortuner, J. Sorensen (Calif. Dept. of Food & Agriculture), J. Milton, J. Diederich (UC Davis), M. Walker (Stanford), J. Woolley and N. Stone (Texas A&M) is planning to study the uses of artificial intelligence and modern computer methods for systematic studies in biology. This study was suggested by the Systematic Biology panel of National Science Foundation. The committee will recruit a group of about twenty systematists, and about a dozen computer scientists interested in the possible application of modern computing techniques to a new domain. The collaborators will meet twice, in early January 1989 and March 1989 during two three-day workshops at University of California, Davis. During the first workshop the participants will make a list of questions and problems in systematics that might be solved by modern computing techniques such as applications of artificial intelligence in expert systems, computer vision, databases, graphics, etc. After this initial workshop, small workgroups of specialists from both fields will collaborate to characterize the options in terms of computing techniques, and to define the most promising approaches to their solutions. During a second workshop, the importance for systematic biology of each of the problems studied, and the current, short, and long term availability of relevant computer techniques will be discussed. A final report will serve as guidelines for NSF funding for future applications of computer techniques to systematic biology. The proceedings of both workshops will be published to serve as a review of state-of-the-art computer methods that may be of use in systematics. Systematic biology is the science that studies the relationships among organisms, and that classifies these organisms according to these relationships. The National Science Foundation is interested in supporting the implementation of expert systems and modern computing techniques for systematic biology. Currently, systematics relies heavily on statistics and algorithmic computer programming. However, different computer methods may be used to solve some of its problems. Expert systems can help with diagnostic identification, correct application of the rules of nomenclature, etc. Capture of the data requires computer vision and image analysis. Museum curation and retrieval of published information could be helped by intelligent access to large databases. Computer graphics can be used for identification and teaching. Finally the systematic studies and the definition of a classification may be helped by intelligent access to databases and relevant statistics. The domain of systematic biology is vast and its boundaries are ill defined. However it is possible to define sub domains that may be studied separately. First, the organisms to be studied must be recovered and measured. In this preliminary phase, sub domains include: - capture of the data: observation of shapes, measurement of lengths, angles, areas, position of one feature in relation to another, etc. - museum curation: finding specimens relevant to the study in museum collections by an intelligent search of the collection records. - identification of the specimens during field surveys: it is necessary to know before studying a group of organisms whether an organism found in the field belongs to this group. - information retrieval from a variety of published sources. A second phase is represented by taxonomic analyses of the relationships existing between organisms and their characteristics. It includes: - ontogeny: the study of the development of the embryo, and the appearance of ancestral features in the embryo that are then used to support hypotheses on its phylogeny. - biogeography: geographical distribution of groups of organisms. - fossil record: the study of ancestral states of characteristics and their evolution along a fossil sequence. - comparative anatomy: comparison of the aspects taken by a feature in related organisms - DNA and gene analysis - definition of apomorphies: search for evolved characters present in all the members of a group. - resolution of homoplasies: search for characteristics that appear similar in two groups, but that are the result of parallel or convergent evolution rather than originating in a common ancestor. - weighting the characteristics: giving more importance to characteristics that supposedly are strong indicators of phylogenetic relationships. - transformation of the raw data: for example, a log transformation may restore normality. These analyses result in the ordering of the organisms studied into groups arranged into some sort of relational networks (trees). Taxonomic analyses and the construction of trees are in fact integrated processes, and they may have to be treated as a whole in the ARTISYST Project. Several methods are in conflict for the definition of the best methods for the definition of a classification tree: parsimony (the shortest tree is the best), maximum likelihood (each taxon is added in turn on the tree where it fits best), ordination (relies on multi variate analyses), etc. All approaches make an extensive use of statistics and algorithmic computer programming but it has been said that most systematic problems cannot be solved by any algorithm. Availability of expert systems may suggest other, non- algorithmic, approaches. Once a tree has been accepted as a working hypothesis, the various taxa in the tree are named according to the rules of the International Code of Zoological Nomenclature (or its Botanical correspondent) and the jurisprudence established over the years by the rulings of the International Commissions of Zoological or Botanical Nomenclature. It may be possible to include rules and jurisprudence into an expert-system similar to legal systems currently under development. Diagnostic identification is the process through which an unknown specimen is allotted to its correct place in an existing classification. This phase is the most promising for the application of current expert system technology. Finally, any new method must have a very friendly man-machine interface to have a chance to be accepted by most systematists. Each topic will be studied by a small workgroup including one or several systematists and one or several computer scientists. Computer scientists interested in participating in this project should contact Renaud Fortuner at (916) 445-4521 or through BITNET at rfortuner@ucdavis. -------