Similarity Terminology

GARAVELL@gunbrf.bitnet (02/02/91)
The recent discussion here on the use of the term "homology" extends back
somewhat further than 1987.

> The author should probably consult these commentaries on the subject:
> Reeck et al., "Homology in proteins and nucleic acids: A terminology
> muddle and a way out of it," Cell 50: 667 (1987); Lewin, "When does
> homology mean something else?" Science 237: 1570 (1987).
> Chris T. Amemiya
> Lawrence Livermore National Laboratory
> fish@amoeba.llnl.gov

The biological term "homologous", denoting similar entities related by
divergent evolution, and the chemical term "homologous", denoting entities
with similar functional groups but slight structural differences, had
contemporaneous origins in the middle part of the last century (J.A.H. Murray,
editor, A New English Dictionary, Oxford, U.K., 1901).  In 1959 C.B. Anfinsen
(as cited by M. Florkin, "Concepts of Molecular Biosemiotics and of Molecular
Evolution" in Comprehensive Biochemistry, vol. 29, part A, p. 61, Elsevier,
Amsterdam, 1974) discussed the chemical homology of biological molecules.
The conflict of meanings eventually led to a polemic between H. Neurath
and E. Margoliash (Neurath et al., Science, 158, 1638-1644, 1967; Nolan and
Margoliash, Annu. Rev. Biochem., 37, 727-790, 1968; Winter et al., Science,
162, 1433, 1968; Margoliash, Science, 163, 127, 1969).  Hoping to help
clarify the completely muddled meanings, M. Florkin (op.cit.) proposed
maintaining the biological terms "homologous" and "analagous", the latter
denoting entities related by convergence, and adding the terms "isologous",
denoting chemically similar entities, "orthologous" and "paralogous",
distinct types of biochemical homology.

An additional problem arises when authors cite "the percent identical residues"
in biopolymer sequences.  Using such a figure without providing the total
number of residues aligned implies that the same significance applies to a
given percentage regardless of the alignment length.  In fact, the statistical
significance of a given percentage of identical residues is a non-linear
function of the length of the sequences being compared; with a significance
limit of 0.1%, two 500 residue amino acid sequences aligned with 13% identical
residues would be significantly related, while two 100 residue sequences with
13% identical residues would not be.  (This is based on the frequency of
occurrance of amino acid residues from an early compilation of peptide
sequences.  It would be preferable to use statistics based on specific sequence
compositions.)
------------------------------------------------------------------------
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Identification Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 GARAVELLI@GUNBRF.BITNET