[bionet.molbio.evolution] Some thoughts on what to do

mbishop@crc.ac.uk (Martin Bishop) (01/26/91)

WHAT TO DO NEXT IN MOLECULAR PHYLOGENY RESEARCH

It is artificial to separate an alignment step and a phylogeny
program (eg. dnapars) step in making phylogenetic reconstructions
as has already been pointed out by another contributor.

We attempted to make some contribution in this direction a few
years ago:
M.J.Bishop, A.E.Friday and E.A.Thompson
Inference of evolutionary relationships.
In M.J.Bishop and C.J.Rawlings 1987.
Nucleic acid and protein sequence analysis
a practical approach. IRL Press, Oxford.

I dont think it is referenced in the Phylip documentation,
so that is why people may be unaware of it.

Even more fundamental is the question of whether the simplistic
models of molecular evolution which these programs use can
be justified in the face of suspected processes of molecular
evolution such as gene conversion (proven processes in some
organisms - fungi).  More worrying needs to be done about the
appropriateness of the models.

I would suggest turning the problem on its head. Instead of trying
to estimate trees and times from sequence data take a group
for which a plausible tree and times can be written down.
Now write a program to tell us about the most likely pathways of
sequence change and relate these to functional constraints on a
variety of groups of macromolecules.

I think a grant committee should be prepared to fund some work along
these lines, at least to see if there is any mileage in it.
Regretably, I am too busy doing other things to have a go at it myself.

Worry more about the processes by which molecular sequences change and
less about getting the most parsimonious (but incorrect tree).

Martin Bishop.

jhgillespie@ucdavis.edu (01/26/91)

In article <4413.9101251737@crc.ac.uk> mbishop@crc.ac.uk (Martin Bishop)
writes:
>WHAT TO DO NEXT IN MOLECULAR PHYLOGENY RESEARCH
>
>Worry more about the processes by which molecular sequences change and
>less about getting the most parsimonious (but incorrect tree).

Here, here!! We have have known since '71 (Ohta and Kimura) that rates of
substitution vary.  We also know the the frequency of the four nucleotides vary
through time.  It is hard to imagine a characterization of the substitution
process that is farther from those assumed by most tree-construction
algorithms.

John Gillespie

joe@GENETICS.WASHINGTON.EDU (Joe Felsenstein) (02/05/91)

John Gillespie wrote (relative to what to do next in molecular phylogney:

> Here, here!! We have have known since '71 (Ohta and Kimura) that rates of
> substitution vary.  We also know the the frequency of the four nucleotides vary
> through time.  It is hard to imagine a characterization of the substitution
> process that is farther from those assumed by most tree-construction
> algorithms.

Well, I can imagine LOTS of models that are even farther!  Seriously, though,
(1) variation of rate of evolution with time (lack of clockness) is definitely
   allowed in most methods of inferring phylogenies (Distance methods, ML,
   parsimony, invariants/evolutionary-parsimony),
(2) variation of frequencies of nucleotides is not allowed in most programs but
 (a) if one is willing to accept the admittedly questionable independence
   of different sites, resampling methods such as the bootstrap allow one to
   investigate the empirical variability of inferences made with imperfect
   models,
 (b) check out Barry and Hartigan's 1987 paper in Statistical Science, which
   puts forward (among others) a model where the transition probability matrix
   varies arbitrarily from branch to branch and they can do maximum likelihood
   for it (in fact, it's easier than my ML).  This would allow varying
   base frequencies in different parts of the tree.
 (c) we've got to do _something_, so we do what we know how.  If John uses his
   considerable powers to formulate a model that is more realistic and
   continues to be computationally tractable, we will all be quite interested
   in it.  A better model would have some specification of the distribution of
   possible equilibrium base frequencies and how quickly they can change as
   one moves along the tree,
(3) Variation of nucleotide composition is real but I think a much more serious
   departure from reality in the models used for ML and distance methods is
   the equal rates of substitution at all sites.  I have some ways one can
   specify unequal rates in my current ML programs and am working on ways
   the method can infer them instead of you having to specify rates.

-----
Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
 Internet:         joe@genetics.washington.edu     (IP No. 128.95.12.41)
 Bitnet/EARN:      felsenst@uwavm
 UUCP:             ... uw-beaver!evolution.genetics!joe