GARAVELL@gunbrf.bitnet (02/02/91)
The recent discussion here on the use of the term "homology" extends back somewhat further than 1987. > The author should probably consult these commentaries on the subject: > Reeck et al., "Homology in proteins and nucleic acids: A terminology > muddle and a way out of it," Cell 50: 667 (1987); Lewin, "When does > homology mean something else?" Science 237: 1570 (1987). > Chris T. Amemiya > Lawrence Livermore National Laboratory > fish@amoeba.llnl.gov The biological term "homologous", denoting similar entities related by divergent evolution, and the chemical term "homologous", denoting entities with similar functional groups but slight structural differences, had contemporaneous origins in the middle part of the last century (J.A.H. Murray, editor, A New English Dictionary, Oxford, U.K., 1901). In 1959 C.B. Anfinsen (as cited by M. Florkin, "Concepts of Molecular Biosemiotics and of Molecular Evolution" in Comprehensive Biochemistry, vol. 29, part A, p. 61, Elsevier, Amsterdam, 1974) discussed the chemical homology of biological molecules. The conflict of meanings eventually led to a polemic between H. Neurath and E. Margoliash (Neurath et al., Science, 158, 1638-1644, 1967; Nolan and Margoliash, Annu. Rev. Biochem., 37, 727-790, 1968; Winter et al., Science, 162, 1433, 1968; Margoliash, Science, 163, 127, 1969). Hoping to help clarify the completely muddled meanings, M. Florkin (op.cit.) proposed maintaining the biological terms "homologous" and "analagous", the latter denoting entities related by convergence, and adding the terms "isologous", denoting chemically similar entities, "orthologous" and "paralogous", distinct types of biochemical homology. An additional problem arises when authors cite "the percent identical residues" in biopolymer sequences. Using such a figure without providing the total number of residues aligned implies that the same significance applies to a given percentage regardless of the alignment length. In fact, the statistical significance of a given percentage of identical residues is a non-linear function of the length of the sequences being compared; with a significance limit of 0.1%, two 500 residue amino acid sequences aligned with 13% identical residues would be significantly related, while two 100 residue sequences with 13% identical residues would not be. (This is based on the frequency of occurrance of amino acid residues from an early compilation of peptide sequences. It would be preferable to use statistics based on specific sequence compositions.) ------------------------------------------------------------------------ Dr. John S. Garavelli Database Coordinator Protein Identification Resource National Biomedical Research Foundation Washington, DC 20007 GARAVELLI@GUNBRF.BITNET