[bionet.genome.arabidopsis] Arabidopsis evolution

BIOCUKM@osucc.bitnet (04/30/91)

     Arabidopsis molecular biologists and geneticists should look at the
recent paper by Kemmerer, Lei and Wu (J. Mol. Evol. 32:227-237, 1991).
The paper reports the isolation and nucleotide sequencing of the
cytochrome c gene of Arabidopsis.  The unusual aspect of the paper is
that the predicted amino acid sequence is much more closely related to
those of the cytochromes c of Neurospora and yeasts than it is to the
cytochrome c sequences of another member of the crucifer family,
cauliflower, and, indeed, of all flowering plants whose cytochrome c
sequences are known.

     When I first saw the abstract, I thought that this would be another
example of grand conclusions from uncertain trees.  Calculated
phylogenetic trees have uncertainties associated with the placement of
branches.  Those uncertainties are rarely included in diagrams and are
often ignored when drawing conclusions.  On reading the paper, I found
the authors well aware of these difficulties.  They analyzed the
sequence data in many different ways.  Each way produced the same highly
significant conclusion.  A visual examination of the amino acid
sequences leads to the same conclusion, that the cytochrome c of the
sequenced gene is more similar to Neurosopora and yeast sequences than
it is to those of higher plants.

     What's going on?  Several explanations come to mind.
1.  The sequenced gene could encode a minor cytochrome c of plants, one
not detected by protein sequencing.  A S. cerevisiae probe was used to
screen an Arabidopsis library.  All 4 hybridizing library clones were
the same.  Thus, the gene is the only Arabidopsis cytochrome c gene
recognized by the yeast probe.  Perhaps the genes for the major
cytochromes c of plants are not recognized by the S. cerevisiae probe.
A difficulty with this and other technical explanations is that similar
conclusions about ancestry were reached with histone H3 comparisons
(though these are less certain).

2.  The sequenced gene was from a contaminant in the DNA preparation.
This is unlikely since a fragment of the cloned gene recognizes a
fragment of the same size in blots of Arabidopsis DNA.

3.  Lateral transfer of the cytochrome c gene (either from yeast to
Arabidopsis or, I suppose, vice versa).  This possibility, raised in the
paper, seems unlikely since the transferred gene would have to have
completely replaced an ancestral gene.

4.  My off-the-cuff favorite hypothesis is that in studying molecular
evolution over such large distances we are really not studying the
evolution of the molecules themselves.  We are really studying the
evolution of the fitness landscape.  Each of the cytochromes c has
evolved to be optimally suited for its particular ecological niche.
Perhaps the relevant features of the Arabidopsis cytochrome c ecological
niche are smallness and rapid growth.  Those properties seem more like
those of the yeasts than of cotton, mung bean, cauliflower, sesame,
sunflower, wheat, buckwheat and Gingko.  Against this view is the
presence of Chlamydomonas on the higher plant branch.

5.  A synthesis of explanations 3 and 4 would be that somewhere in
Arabidopsis' ancestry it was in close association (symbiosis?) with a
fungus.  The association was so close that heterokaryons formed.  With
time one of the copies of genes that were pesent in both the plant and
the fungal progenitor was lost.  Selection may have favored the fungal
cytochrome c.

     What do the rest of you think?  This is a question of economic, as
well as scientific, importance.  The paper makes the suggestion that
Arabidopsis may not be a good model for higher plants.  If this view
takes hold, there goes the funding!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!       Ulrich Melcher                     !!!
!!!       Department of Biochemistry         !!!
!!!       Oklahoma State University          !!!
!!!       Stillwater OK 74078  USA           !!!
!!!  BIOCUKM@OSUCC (Bitnet)    405-744-6210  !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

KELLOGG@FRODO.MGH.HARVARD.EDU (05/01/91)

    I'm not sure I'd be all that worried about the weird placement of
Arabidopsis on the cytochrome c tree, and I certainly wouldn't question 
the utility of Arabidopsis as a model system.  There are several reasons:  
    1.  Trees based on cytochrome c tend to be peculiar in general.  For 
example, if you check the trees published by Syvanen et al. (JME, 1989, 
ref. in the Kemmerer paper) you can find all sorts of oddities.  In their 
fig. 5 they show a minimal eukaryotic cyctochrome c tree - the two fish 
don't come out together; on the vertebrate clade, the frog is the most 
basal group, more primitive even than the carp; sesame (a dicot) comes out 
with rice (a monocot); the grasses (wheat, maize and rice) form a 
paraphyletic rather than monophyletic group.  
    2.  The tree published by Kemmerer et al. is not particularly robust.  
The numbers beside the branches are bootstrap replicates, not mutations.  
Thus the only groupings that appear in at least half of the bootstrap 
samples are the fungus + Arabidopsis clade and the higher plant clade.  
Those who choose to interpret bootstraps samples in terms of statistical 
significance would say those groupings are significant at the 50% level - 
i.e. not very significant.  If you want to use a 95% level, then only the 
Neurospora + Arabidopsis grouping is significant and the rest of the tree 
is phylogenetically meaningless.  The statistical interpretation of 
bootstraps is under heavy fire at the moment, so I wouldn't push it too 
far; however, if the animal grouping only appears in 19 out of 50 
replicates, it doesn't seem overly convincing.
    3.  Based on the phenograms, the Arabidopsis sequence is not 
particularly similar to that of Neurospora - the similarity is much less 
than that among the higher plants.  So it may not be much like a higher 
plant cytochrome, but it isn't all that much like a Neurospora cytochrome 
either.  Arabidopsis is on a long branch, and it has been extensively 
documented that "long branches attract" in phylogenetic analyses (the 
so-called Felsenstein zone).  The best way to correct that problem is to 
increase the sampling density around the problematic taxon.
    
    The implication of 1-3 is that cytochrome c may not be much use as 
indicator of relationship.  On the other hand, the fact that similarities 
in the molecules do not appear to be determined primarily by phylogeny 
means that there must be some really interesting molecular biology going 
on - this gets back to the fitness landscape described by Ulrich Melcher.  
Possibly selection has been strong enough to effectively wipe out much of 
the phylogenetic information in the molecule.
    The other possibility is one mentioned by Kemmerer et al. in their 
companion paper in Mol. Biol. and Evol. in which they suggest that the 
problem is comparison of paralogous genes.  If there are several copies of 
the cytochrome c gene, then each copy will have its own phylogeny.  If the 
sequence for Arabidopsis is from one copy of the gene, and at least some 
of the other plant sequences are from another copy (or, worse yet, from 
several other copies), then you are comparing apples and oranges - you 
get a funny tree; the tree won't correspond either to a gene phylogeny or 
to an organismic phylogeny.  I saw a tree a few days ago on sequences of 
glucanases in which that was clearly what had happened. 
    Given all this, I suppose that lateral gene transfer can't be ruled 
out, but I'm not sure it's the most compelling explanation of the pattern.

    My conclusion is that Arabidopsis is probably a fine model for higher 
plants - just that attempts to generalize results from Arabidopsis need to 
be checked.  This is hardly a radical suggestion.  The stronger conclusion 
though is that cytochrome c is probably not a good molecule for exploring 
organismic phylogeny.  Other people have reached this conclusion before.  
Maybe what we need now is some exploration of why it isn't much use.
Elizabeth Kellogg, Dept. of Molecular Biology, Mass. General Hospital, 
Boston, MA; and Arnold Arboretum of Harvard University, 22 Divinity Ave., 
Cambridge, MA 02138   kellogg@frodo.mgh.harvard.edu