[bionet.molbio.evolution] Distance methods

joe@uw-evolution.UUCP (Joe Felsenstein) (11/22/88)

In reply to Dan Davison's statement that he constructs trees from
distance matrices by applying a clustering algorith such as UPGMA
to the matrix, and prefers this to the approach I described, of
trying to fit the observed table of distances by one generated from
the tree:

An interesting property of UPGMA clustering is that, for the given
topology that it comes up with, the clustering levels (in effect the
branch lengths of the tree) are assigned by least squares, that is, 
they achieve a least squares fit of observed to expected distances.  
Thus what Dan has been doing and what I envisaged are very closely
similar, except that if one conceives the task more generally as 
achieving a least squares fit to the observed distances, one can
then think of trying other tree topologies as well.  Thus I think
the least squares approach is the more general and more or less sort
of includes UPGMA.  It will often but not always give the same tree.

If one does not restrict the search to trees that have all tips
level with each other (ultrametrics), then one can also fit cases
that have unequal rates of evolution in different lineages.  That
involves even fewer assumptions than the "minimum number" Dan
asserts his approach makes.

Methods such as UPGMA are simply defined as the result of applying a
certain algorithm.  Methods such as least squares optimize an objective
function.  It is easier to investigate their properties.  Fortunately
in this case the two are closely related.

Joe Felsenstein
Dept. of Genetics SK-50, Univ. of Washington, Seattle WA 98195
 BITNET:    FELSENST@UWALOCKE
 INTERNET:  joe@evolution.ms.washington.edu
       or:  uw-evolution!joe@entropy.ms.washington.edu
 UUCP:      ... uw-beaver!uw-entropy!uw-evolution!joe