steffen@mbir.bcm.tmc.edu (David Steffen) (01/31/91)
I am again struggling with the proper use of the words "homology", "similarity", and "identity" in comparing sequences. Specifically, we have cloned and sequenced (a bit of) the rat homologue of the _lck_ gene. The sequence of the mouse and human _lck_ genes is known. How do we know what we have is the rat homologue? Because when we compare our sequence to the published sequence, most of the nucleotides can be made to match up with minimal futzing. So how do I say that? At present, we are saying: "In all four cases, the inserts were found to contain sequences homologous to human and mouse lck..." but one of my grad students points out that the word homologous is incorrect, since it represents an inference about evolution rather than a statement of fact. My objection to replacing the word "homologous" with the word "similar" is that is gives the impression that the sequences don't match all that well. My objection to replacing the word "homologous" with the word "identical" is that the sequences are not identical. My objection to replacing the word "homologous" with the words "##% identical" is that I would need four different numbers for the four different tumors, making the sentence practically unreadable. I guess if "similar" is the only correct word in this context, I could live with that. However, since I believe that we are dealing with homologous sequences, is the word "homologous" really incorrect? (I understand that "##% homologous" is always wrong; sequences are either homologous or they are not.) Email me if you wish, but I suspect others may wonder about this as well and that a discussion might be a "good thing". -- David Steffen Department of Cell Biology, Baylor College of Medicine, Houston TX 77030 Telephone = (713) 798-6655, FAX = (713) 790-0545 Internet = steffen@mbir.bcm.tmc.edu
boycet%frodob.dnet@ASMUS1.GENETICS.UGA.EDU (01/31/91)
Why don't you simply say that "based on the XX % similarity found among these sequences, we conclude that we have cloned the gene homologous to the human _whatever_ gene" ? I personally feel that its preferable to be as explicit as possible when making hypotheses of homology among genes, structures, etc. Good luck. T. M. Boyce
wrp@biochsn.acc.Virginia.EDU (William R. Pearson) (01/31/91)
I would feel very uncomfortable calling some cloned inserts from an insect "homologous" to mammalian genes. I would say that they were "X - Y% identical", or that they were "very similar." The point you want to make, of course, is that you now know the sequence of the insect homologues of some mammalian genes, but it is the insect "genes" that are homologous. I would put the cloned inserts into a different category. Bill Pearson
beckfdp@pallas.network.com (D. Pat Beckfield) (01/31/91)
In article <3824@gazette.bcm.tmc.edu> steffen@mbir.bcm.tmc.edu (David Steffen) writes: > > I am again struggling with the proper use of the words "homology", >"similarity", and "identity" in comparing sequences. Specifically, we >have cloned and sequenced (a bit of) the rat homologue of the _lck_ >gene. The sequence of the mouse and human _lck_ genes is known. How >do we know what we have is the rat homologue? Because when we compare our >sequence to the published sequence, most of the nucleotides can be made >to match up with minimal futzing. So how do I say that? > [discussion of rationale deleted] > >-- >David Steffen >Department of Cell Biology, Baylor College of Medicine, Houston TX 77030 >Telephone = (713) 798-6655, FAX = (713) 790-0545 >Internet = steffen@mbir.bcm.tmc.edu As you're discussing semantics, it seems appropriate that I (a writer and BS in zoology) respond. The first question is does the rat sequence you're working with execute the same function as the similar sequences for mice and humans? If you don't know, is there a way to test it? Until you know this, what you have is a homomorphic string -- similar structure, but not necessarily having the same function. If they do carry out the same functions, but you're still concerned by evolutionary relations, you can call them "analogous" -- having the same function, but not necessarily the same origin. I hope this helps. -- D. Patrick Beckfield pat.beckfield@network.com 7600 Boone Ave N (612) 424-4888 Network Systems Corporation
ronald@uhunix1.uhcc.Hawaii.Edu (Ronald A. Amundson) (02/01/91)
In article <3824@gazette.bcm.tmc.edu> steffen@mbir.bcm.tmc.edu (David Steffen) writes: > > I am again struggling with the proper use of the words "homology", >"similarity", and "identity" in comparing sequences. ... I agree with some some of the followup already given, and I'm not a great expert on molecular genetics. But there is an interesting problem here. The term "homology" clearly is being used differently in molecular genetics from its usage in traditional evolutionary biology. Steve Gould comments on the issue in his Natural History column for Feb. 1988, BTW, wishing that the molecular biologists would talk more like macro-biologists. The problem with calling identical molecular sequences "homologies" is not _just_ that it implies a common source for the two sequences. One of the commentators is correct that _any_ evolutionary use of "homology" infers a common source on less-than-certain evidence. The problem is that the criteria by which the common source is identified is different in the molecular and "macroscopic" inferences of homology. I can think of two differences -- forgive my ignorance if I've got facts wrong. 1) Good macroscopic evolutionary inferences of homology are based on "shared derived" characteristics. The nests of other sets of traits disallow certain similarities to count as homologies. Mere similarity alone can never be used to judge two traits as homologous. (Unless I'm wrong) the "mere similarity" (i.e. molecular identity or similarity, in the absence of evidence provided by other hierarchies of traits) of molecular sequences is used as a sufficient criterion for the term "homology" in molecular genetics. 2) It seems to me (insert disclaimer again) that when molecular biologists call sequences homologous, they mean that the two were copied from a similar ancestral _molecular sequence_. But the processes of copying molecular sequences are not identical to the processes of reproducing organisms. As I understand it, sequences can be copied within a genome, and with manipulation (and maybe some kinds of viral infection and other exotic stuff) between genomes. So the geneological tree connecting up similar sequences with their molecular ancestors will not be isomorphic with the geneological tree connecting organisms with their ancestors. So it looks as if the molecular use of "homology" is a _different_ use from the normal evolutionary use of the same term. Whether that is a tragedy or not depends on how confused we get by it. But I think it is worth noting that different concepts are being used. There are _lots_ of cases in the history of biology where different uses of the same word led to long futile disputes (e.g. the term "mutation" at the beginning of the century). Ron Amundson Dept. of Philosophy University of Hawaii at Hilo Hilo, HI 96720-4091 ronald@uhunix.bitnet
owhite@nmsu.edu (smouldering dog) (02/01/91)
In article <3824@gazette.bcm.tmc.edu> steffen@mbir.bcm.tmc.edu (David Steffen) writes: > I am again struggling with the proper use of the words "homology", > "similarity", and "identity" in comparing sequences. Specifically, we > have cloned and sequenced (a bit of) the rat homologue of the _lck_ > etc...... this is a topic worth discussing. my understanding is that when you are referring to a sequence that has _some_ nucleotides in a close approximation to another sequence you should say: sequence A has __% similarity to sequence B. alternatively you can say: sequence A has __% identity to sequence B. to refer to two sequences being homologous means they are a _strict_ (nucleotide for nucleotide) match. as in: the cDNA sequence to gene A is homologous to region X of the genomic clone of gene A. alternatively you can say: the cDNA sequence to gene A is identical to region X of the genomic clone of gene A. however, I am curious if the rest of the community agrees with the above usages of identity, similarity and homology. owen white -- owen white (owhite@nmsu.edu) -=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-*-=-=-*-=- got my head on a pole (for better reception) -=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-*-=-=-*-=-
fish@AMOEBA.LLNL.GOV (02/01/91)
This is in response to a recent query by David Steffen regarding the use of the term "homology." This controversy has received much attention in recent years as most of the subscribers of this bulletin board will attest to. The author should probably consult these commentaries on the subject: Reeck et al., "Homology in proteins and nucleic acids: A terminology muddle and a way out of it," Cell 50: 667 (1987); Lewin, "When does homology mean something else?" Science 237: 1570 (1987). I tend to agree with William Pearson's suggestions, i.e., without prior knowledge of whether given genes are "orthologous" it is probably a good idea not to say there are homologies between them. For the past few years, I have been researching the evolution of the immunoglobulin multigene family and have encountered similar terminology problems to that described by Dr. Steffen. Chris T. Amemiya Lawrence Livermore National Laboratory fish@amoeba.llnl.gov
owhite@nmsu.edu (smouldering dog) (02/01/91)
In article <1991Jan31.155713.27154@ns.network.com> beckfdp@pallas.network.com (D. Pat Beckfield) writes: > As you're discussing semantics, it seems appropriate that I (a writer and > BS in zoology) respond. > discussion deleted... > If they do carry out the same functions, but you're still concerned by > evolutionary relations, you can call them "analogous" -- having the > same function, but not necessarily the same origin. > > D. Patrick Beckfield pat.beckfield@network.com in the literature, two similar sequences are RARELY referred to as "analogous" -- owen white (owhite@nmsu.edu) -=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-*-=-=-*-=- got my head on a pole (for better reception) -=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-=-*-=-*-=-=-*-=-
wrp@biochsn.acc.Virginia.EDU (William R. Pearson) (02/02/91)
Dr. Amundson from Hawaii is precisely correct, molecular biologists use the term "homology" to denote "common ancestry." In talking to evolutionary biologists, it is my understanding that their use of the term is precisely the same. However, evolutionary biologists often refer to two distinct types of homology: orthology - where the two sequences encode the same protein, e.g. a myoglobin in a human and a myoglobin in a whale paralogy - where the relationship between the two sequences is not consistent with the phylogeny, e.g. myoglobin in a human and beta-globin in a human (here, the ancestor of myoglobin and hemoglobin is much older than recent ancestors of humans). Evolutionary biologists discuss a similarity relationship that is the converse of homology - analogy, or similarity due to convergent evolution. It is my opinion there are no good examples of convergent evolution based on protein or DNA sequence. The main point, of course, is to distinguish between the supposition - homology - and the fact - similarity or percent identity. Bill Pearson
wrp@biochsn.acc.Virginia.EDU (William R. Pearson) (02/02/91)
In article <11223@uhccux.uhcc.Hawaii.Edu> ronald@uhunix1.uhcc.Hawaii.Edu (Ronald A. Amundson) writes: >In article <3824@gazette.bcm.tmc.edu> steffen@mbir.bcm.tmc.edu (David Steffen) writes: >> >> I am again struggling with the proper use of the words "homology", >>"similarity", and "identity" in comparing sequences. ... > >problem here. The term "homology" clearly is being used differently >in molecular genetics from its usage in traditional evolutionary >biology. Steve Gould comments on the issue in his Natural History >column for Feb. 1988, BTW, wishing that the molecular biologists would >talk more like macro-biologists. I do not believe that the use of the term is different >The problem with calling identical molecular sequences "homologies" is >not _just_ that it implies a common source for the two sequences. My understanding is that homology implies common ancestry (source) - nothing more or less. >1) Good macroscopic evolutionary inferences of homology are based on >"shared derived" characteristics. The nests of other sets of traits >disallow certain similarities to count as homologies. Mere similarity >alone can never be used to judge two traits as homologous. I would argue that while "mere" similarity is insufficient, there are levels of similarity that allow one to infer homology and never be mistaken. Everyone accepts that two sequences that are 100% identical are homologous, and clearly one should not feel too uncomfortable with two sequences that are 90% identical (over their entire length). The issue arrises when two protein sequences share less than 20 - 25% identity. But there are a series of tests, based on sequence similarity alone, that make it very unlikely that the inference of homology is incorrect. > (Unless >I'm wrong) the "mere similarity" (i.e. molecular identity or >similarity, in the absence of evidence provided by other hierarchies >of traits) of molecular sequences is used as a sufficient criterion >for the term "homology" in molecular genetics. >2) It seems to me (insert disclaimer again) that when molecular >biologists call sequences homologous, they mean that the two were >copied from a similar ancestral _molecular sequence_. But the >processes of copying molecular sequences are not identical to the >processes of reproducing organisms. As I understand it, sequences can >be copied within a genome, and with manipulation (and maybe some kinds >of viral infection and other exotic stuff) between genomes. So the >geneological tree connecting up similar sequences with their molecular >ancestors will not be isomorphic with the geneological tree connecting >organisms with their ancestors. Here, the terms "homologous" and "orthologous" are being confused. Two sequences are homologous if they share a common ancestor, no matter how complex and exotic the evolutionary path between that ancestor and the present. >So it looks as if the molecular use of "homology" is a _different_ use >from the normal evolutionary use of the same term. This is not true. We all agree on common ancestry. We may disagree on the amount of evidence required to support the assertion of common ancestry, but we all mean the same thing. Bill Pearson
frist@ccu.umanitoba.ca (02/02/91)
I think it may be useful, at this point in the discussion, to construct a chart relating the terms under discussion. Comments, anyone? IDENTICAL When two structures share common substructures, those common substructures are said to be identical. | SIMILAR | Similarity is the degree to which two structures share common substructures. | +---------------------+---------------------+ | | HOMOLOGOUS ANALOGOUS Similarity due to common Similarity due to convergent ancestry evolution | +---------------------+ | | ORTHOLOGOUS PARALOGOUS Homologous structure with Homologous structure with with conserved function divergent function =============================================================================== Brian Fristensky | "The important thing is to develop the Dept. of Plant Science | capacity to see one kernel that is differ- University of Manitoba | ent, and make that understandable. If it Winnipeg, MB R3T 2N2 CANADA | doesn't fit, there's a reason, and you find frist@ccu.umanitoba.ca | out what it is." Office phone: 204-474-6085 | Barbara McClintock, from A FEELING FOR THE FAX: 204-275-5128 | ORGANISM by Evelyn Fox Keller ===============================================================================
Doug_Eernisse@UB.CC.UMICH.EDU (02/02/91)
This is another response to the recent query by David Steffen regarding the use of the term "homology." I thought I might as well throw my two cents worth in. Many molecular biologists commonly use informal arbitrary criteria as support for statements of the "homology" of two genes. For example, they might suggest that if two peptide sequences in the same organism were highly similar (e.g., 85 percent) then one could be confident that the proteins were "homologous", due to a gene duplication event, as opposed to similarity due to parallel evolution for similar function. It seems to me that hypotheses of homology are only relevant to phylogenetic inference at the level they are proposed to be synapomorphic (shared derived similarities) on a cladogram. Therefore, it is hopeless to try to provide evidence for homology by the comparison of two taxa or sequences. The only interesting evidence one can bring to bear on the issue of common ancestry is shared "special" similarity relative to one or more outgroup taxa or sequences. This issue is, I think, distinct from the issue of whether homology is used as many of us use the term synapomorphy, as a proposal of homology, or as the actual similarity due to common ancestry which is ultimately impossible to prove. With sequence data, there are also problems of specifying the level of homology. For example, Michael Ghiselin (Syst. Zool. 18: 148-149 (1969) uses the following hypothetical example: A Asp-Val-Glu-Met-Ala B Asp-Pro-Glu-Met-Ala C Asp-Pro-Thr-Met-Ala D Gly-Pro-Thr-Met-Ala E Gly-Pro-Thr-Tyr-Ala F Gly-Pro-Thr-Tyr-Ser Ghiselin argues that similarity is a relation between the peptides as wholes, which decreases from A to F, while homology is a relation between the parts. He argues, for example, that Asp is hypothesized to be homologous to Gly in A and F, respectively, given this alignment of the sequences. He also argues that the peptide sequence A could be homologous to F even though they are completely dissimilar. One can speak of the correspondence between nucleotides or amino acids in terms of their position in a sequence which is hypothesized to be homologous. Although Ghiselin doesn't consider this use of homology, one more normally may also speak of the shared similarity of D, E and F at site 1, relative to A, B and C, which could be a synapomorphy (hypothesis of homology), depending on the outgroup(s) one selects which in turn determines the cladogram topology. One can also hypothesize that peptide F is homologous to peptide A, or more precisely, hypothesize that the shared ancestor of A and F had single protein-coding gene which is traceable, by descent, to the genes in A and F which produced these peptides. Confusing, isn't it? Doug Eernisse usergdef@ub.cc.umich.edu usergdef@umichub.bitnet Museum of Zoology and Dept. of Biology University of Michigan
beckfdp@pallas.network.com (D. Pat Beckfield) (02/02/91)
In article <OWHITE.91Feb1084317@haywire.nmsu.edu> owhite@nmsu.edu (smouldering dog) writes: >In article <1991Jan31.155713.27154@ns.network.com> beckfdp@pallas.network.com (D. Pat Beckfield) writes: >> If they do carry out the same functions, but you're still concerned by >> evolutionary relations, you can call them "analogous" -- having the >> same function, but not necessarily the same origin. >> >> D. Patrick Beckfield pat.beckfield@network.com > >in the literature, two similar sequences are RARELY referred to as >"analogous" >-- > > owen white (owhite@nmsu.edu) > I must point out that this is a failing of the authors of the literature, not the English language. If the author is not confident enough of the origins of the material, but is confident that it is homomorphic with and carries out the same functions as other material, then the correct word in the English language is "analogous". The word is ideally suited to the situation as described. I can state this emphatically as a professional technical writer, condescension not withstanding. Respectfully, D.P.B. -- D. Patrick Beckfield pat.beckfield@network.com 7600 Boone Ave N (612) 424-4888 Network Systems Corporation Minneapolis, MN 55428-1099
joe@evolution.u.washington.edu (Joe Felsenstein) (02/03/91)
In article <11223@uhccux.uhcc.Hawaii.Edu> ronald@uhunix1.uhcc.Hawaii.Edu (Ronald A. Amundson) writes: > The >problem is that the criteria by which the common source is identified >is different in the molecular and "macroscopic" inferences of >homology. I can think of two differences -- forgive my ignorance if >I've got facts wrong. > >1) Good macroscopic evolutionary inferences of homology are based on >"shared derived" characteristics. The nests of other sets of traits >disallow certain similarities to count as homologies. Mere similarity >alone can never be used to judge two traits as homologous. (Unless >I'm wrong) the "mere similarity" (i.e. molecular identity or >similarity, in the absence of evidence provided by other hierarchies >of traits) of molecular sequences is used as a sufficient criterion >for the term "homology" in molecular genetics. >Ron Amundson >Dept. of Philosophy >University of Hawaii at Hilo >Hilo, HI 96720-4091 >ronald@uhunix.bitnet As far as I can see "homology" as used by morphological systematists is the same thing. Many studies of morphology don't actually base themselves on characters where ancestral and derived states can be predetermined. Instead they toss the data into a computer, get a tree by (say) Wagner parsimony, and use an outgroup criterion to root the tree, and in the process determine after the fact which states are ancestral and where the synapomorphies are. That's the same thing molecular evolutionists do. They will often use more than one sequence and judge "homology" by where the sequence fits in on a phylogeny of the sequences, where they might use (say) Wagner parsimony with outgroup-rooting. If this is done, then I see no real difference between the two processes. ----- Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195 Internet: joe@genetics.washington.edu (IP No. 128.95.12.41) Bitnet/EARN: felsenst@uwavm UUCP: ... uw-beaver!evolution.genetics!joe
ahouse@BINAH.CC.BRANDEIS.EDU (02/04/91)
It seems to me that if you believe that these sequences are identical at many positions because they derive from a common ancestor then you can feel justified in using "homology" and do this knowing that you are implying an inference. You may in this case wish to distinguish paralogous from orthologous. (see Patterson's introduction to Molecules and Morphology in Evolution: Conflict or Compromise) Briefly these terms distinguish between identities that exists because of shared ancestor of 2 species (orthologous) and identities that come from a gene duplication event within a linneage (think of all of the proteins that have Ig like domains). You seem to feel that you have orthologous sequences that share many identities. As an aside, molecular biologists seem much more willing to use homology to mean "it looks the same (= same sequences)" and morphologists prefer to insist that homology indicates a stronger statement of the reason for the similarity (homology -> derived from a common ancestor). Jeremy Ahouse Brandeis University - Biophysics
ahouse@BINAH.CC.BRANDEIS.EDU (02/07/91)
In article <OWHITE.91Jan31142027@haywire.nmsu.edu>, owhite@nmsu.edu (smouldering dog) writes:> >to refer to two sequences being homologous means they are a _strict_ >(nucleotide for nucleotide) match. Homology is often used with the goal of stating an inference about relatedness. You either have sequence identity or if you've looked at it amino acid identity. You may wish to indicate that some bases or amino acids are switched but have the same function... So all this means that you should be very explicit about what you intend. Your reading of homology as "strict" match replaces a statement about inferred relationship with one about pattern and seems too restrictive. Jeremy
ahouse@BINAH.CC.BRANDEIS.EDU (02/07/91)
> orthology - where the two sequences encode the same protein, > e.g. a myoglobin in a human and a myoglobin in a whale > > paralogy - where the relationship between the two sequences > is not consistent with the phylogeny, e.g. myoglobin in a human > and beta-globin in a human (here, the ancestor of myoglobin > and hemoglobin is much older than recent ancestors of humans). > I think that you might what to distinguish these terms not with respect to phylogentic consistency but rather in terms of genetic events. A gene duplication (that gives rise to a pair of paralogous genes) may happen before a linneage split. If the 2 genes become functionally diiferent in the 2 linneages you may be fooled into imagining a homology that is due to a shared derived gene when in fact you don't have orthologous genes because of the paralogy in the ancestor. Jeremy Ahouse Brandei