[comp.ai.digest] Conference - ARTISYST: AI for Systematics

WALKER@SUMEX-AIM.STANFORD.EDU (Michael Walker) (03/29/88)
                     The ARTISYST Workshop 

          (ARTificial Intelligence for SYSTematics)           
 

Uses of Artificial Intelligence and modern computer methods for
                 systematic studies in biology.

    Contact: Renaud Fortuner at (916) 445-4521 or through
BITNET or ARPANET at rfortuner@ucdavis.

An organization committee with R. Fortuner, J. Sorensen (Calif. Dept.
of Food & Agriculture), J. Milton, J. Diederich (UC Davis), M. Walker
(Stanford), J. Woolley and N. Stone (Texas A&M) is planning to study
the uses of artificial intelligence and modern computer methods for
systematic studies in biology. This study was suggested by the
Systematic Biology panel of National Science Foundation.

The committee will recruit a group of about twenty systematists, and
about a dozen computer scientists interested in the possible
application of modern computing techniques to a new domain. The
collaborators will meet twice, in early January 1989 and March 1989
during two three-day workshops at University of California, Davis.

During the first workshop the participants will make a list of
questions and problems in systematics that might be solved by modern
computing techniques such as applications of artificial intelligence
in expert systems, computer vision, databases, graphics, etc. After
this initial workshop, small workgroups of specialists from both
fields will collaborate to characterize the options in terms of
computing techniques, and to define the most promising approaches to
their solutions.
 
During a second workshop, the importance for systematic biology
of each of the problems studied, and the current, short, and long
term availability of relevant computer techniques will be
discussed. A final report will serve as guidelines for NSF
funding for future applications of computer techniques to
systematic biology. The proceedings of both workshops will be
published to serve as a review of state-of-the-art computer
methods that may be of use in systematics.

Systematic biology is the science that studies the relationships among
organisms, and that classifies these organisms according to these
relationships.  The National Science Foundation is interested in
supporting the implementation of expert systems and modern computing
techniques for systematic biology. Currently, systematics relies
heavily on statistics and algorithmic computer programming. However,
different computer methods may be used to solve some of its problems.
Expert systems can help with diagnostic identification, correct
application of the rules of nomenclature, etc. Capture of the data
requires computer vision and image analysis. Museum curation and
retrieval of published information could be helped by intelligent
access to large databases. Computer graphics can be used for
identification and teaching. Finally the systematic studies and the
definition of a classification may be helped by intelligent access to
databases and relevant statistics.

The domain of systematic biology is vast and its boundaries are ill
defined. However it is possible to define sub domains that may be
studied separately. First, the organisms to be studied must be
recovered and measured. In this preliminary phase, sub domains
include:

     - capture of the data: observation of shapes, measurement of
lengths, angles, areas, position of one feature in relation to
another, etc.

     - museum curation: finding specimens relevant to the study
in museum collections by an intelligent search of the collection
records.

     - identification of the specimens during field surveys: it
is necessary to know before studying a group of organisms
whether an organism found in the field belongs to this group.

     - information retrieval from a variety of published sources.

A second phase is represented by taxonomic analyses of the
relationships existing between organisms and their characteristics.
It includes:

     - ontogeny: the study of the development of the embryo, and
the appearance of ancestral features in the embryo that are then
used to support hypotheses on its phylogeny.

     - biogeography: geographical distribution of groups of
organisms.

     - fossil record: the study of ancestral states of characteristics
and their evolution along a fossil sequence.

     - comparative anatomy: comparison of the aspects taken by a
feature in related organisms

     - DNA and gene analysis

     - definition of apomorphies: search for evolved characters
present in all the members of a group.

     - resolution of homoplasies: search for characteristics that
appear similar in two groups, but that are the result of parallel
or convergent evolution rather than originating in a common
ancestor.

     - weighting the characteristics: giving more importance to
characteristics that supposedly are strong indicators of
phylogenetic relationships.

     - transformation of the raw data: for example, a log
transformation may restore normality.

These analyses result in the ordering of the organisms studied into
groups arranged into some sort of relational networks (trees).
Taxonomic analyses and the construction of trees are in fact
integrated processes, and they may have to be treated as a whole in
the ARTISYST Project. Several methods are in conflict for the
definition of the best methods for the definition of a classification
tree: parsimony (the shortest tree is the best), maximum likelihood
(each taxon is added in turn on the tree where it fits best),
ordination (relies on multi variate analyses), etc. All approaches
make an extensive use of statistics and algorithmic computer
programming but it has been said that most systematic problems cannot
be solved by any algorithm.  Availability of expert systems may
suggest other, non- algorithmic, approaches.

Once a tree has been accepted as a working hypothesis, the various
taxa in the tree are named according to the rules of the International
Code of Zoological Nomenclature (or its Botanical correspondent) and
the jurisprudence established over the years by the rulings of the
International Commissions of Zoological or Botanical Nomenclature. It
may be possible to include rules and jurisprudence into an
expert-system similar to legal systems currently under development.

Diagnostic identification is the process through which an unknown
specimen is allotted to its correct place in an existing
classification. This phase is the most promising for the application
of current expert system technology.

Finally, any new method must have a very friendly man-machine
interface to have a chance to be accepted by most systematists.

Each topic will be studied by a small workgroup including one or
several systematists and one or several computer scientists.
Computer scientists interested in participating in this project
should contact Renaud Fortuner at (916) 445-4521 or through
BITNET at rfortuner@ucdavis.


 





-------