mbishop@crc.ac.uk (Martin Bishop) (01/26/91)
WHAT TO DO NEXT IN MOLECULAR PHYLOGENY RESEARCH It is artificial to separate an alignment step and a phylogeny program (eg. dnapars) step in making phylogenetic reconstructions as has already been pointed out by another contributor. We attempted to make some contribution in this direction a few years ago: M.J.Bishop, A.E.Friday and E.A.Thompson Inference of evolutionary relationships. In M.J.Bishop and C.J.Rawlings 1987. Nucleic acid and protein sequence analysis a practical approach. IRL Press, Oxford. I dont think it is referenced in the Phylip documentation, so that is why people may be unaware of it. Even more fundamental is the question of whether the simplistic models of molecular evolution which these programs use can be justified in the face of suspected processes of molecular evolution such as gene conversion (proven processes in some organisms - fungi). More worrying needs to be done about the appropriateness of the models. I would suggest turning the problem on its head. Instead of trying to estimate trees and times from sequence data take a group for which a plausible tree and times can be written down. Now write a program to tell us about the most likely pathways of sequence change and relate these to functional constraints on a variety of groups of macromolecules. I think a grant committee should be prepared to fund some work along these lines, at least to see if there is any mileage in it. Regretably, I am too busy doing other things to have a go at it myself. Worry more about the processes by which molecular sequences change and less about getting the most parsimonious (but incorrect tree). Martin Bishop.
jhgillespie@ucdavis.edu (01/26/91)
In article <4413.9101251737@crc.ac.uk> mbishop@crc.ac.uk (Martin Bishop) writes: >WHAT TO DO NEXT IN MOLECULAR PHYLOGENY RESEARCH > >Worry more about the processes by which molecular sequences change and >less about getting the most parsimonious (but incorrect tree). Here, here!! We have have known since '71 (Ohta and Kimura) that rates of substitution vary. We also know the the frequency of the four nucleotides vary through time. It is hard to imagine a characterization of the substitution process that is farther from those assumed by most tree-construction algorithms. John Gillespie
joe@GENETICS.WASHINGTON.EDU (Joe Felsenstein) (02/05/91)
John Gillespie wrote (relative to what to do next in molecular phylogney: > Here, here!! We have have known since '71 (Ohta and Kimura) that rates of > substitution vary. We also know the the frequency of the four nucleotides vary > through time. It is hard to imagine a characterization of the substitution > process that is farther from those assumed by most tree-construction > algorithms. Well, I can imagine LOTS of models that are even farther! Seriously, though, (1) variation of rate of evolution with time (lack of clockness) is definitely allowed in most methods of inferring phylogenies (Distance methods, ML, parsimony, invariants/evolutionary-parsimony), (2) variation of frequencies of nucleotides is not allowed in most programs but (a) if one is willing to accept the admittedly questionable independence of different sites, resampling methods such as the bootstrap allow one to investigate the empirical variability of inferences made with imperfect models, (b) check out Barry and Hartigan's 1987 paper in Statistical Science, which puts forward (among others) a model where the transition probability matrix varies arbitrarily from branch to branch and they can do maximum likelihood for it (in fact, it's easier than my ML). This would allow varying base frequencies in different parts of the tree. (c) we've got to do _something_, so we do what we know how. If John uses his considerable powers to formulate a model that is more realistic and continues to be computationally tractable, we will all be quite interested in it. A better model would have some specification of the distribution of possible equilibrium base frequencies and how quickly they can change as one moves along the tree, (3) Variation of nucleotide composition is real but I think a much more serious departure from reality in the models used for ML and distance methods is the equal rates of substitution at all sites. I have some ways one can specify unequal rates in my current ML programs and am working on ways the method can infer them instead of you having to specify rates. ----- Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195 Internet: joe@genetics.washington.edu (IP No. 128.95.12.41) Bitnet/EARN: felsenst@uwavm UUCP: ... uw-beaver!evolution.genetics!joe