Last week I noted that there has been recent activity concerning the "placental root" problem, in which different genetic datasets support different phylogenetic trees for the root of the placental mammal clade (Conflicting placental roots: network or tree?). There are two articles (by Morgan et al. and Romiguier et al.) in the current issue of Molecular Biology & Evolution that address this problem with genomic data, and find two different well-supported trees.
This is an issue that I also addressed in a much earlier post (EDA or post-optimality analysis of phylogenetic data?), based on the genomic dataset of Meredith et al., in which I concluded:
It is not immediately obvious that a tree-building analysis is going to be of much use for this dataset. There is certainly some "power of building phylogenies from large densely sampled datasets", but this does not automatically mean that those phylogenies will be tree-like. Evolution involves a more diverse process than that.In all of these cases, sophisticated substitution models (nucleotide or amino acid) were used as the basis for building a phylogenetic tree, whereas the network analysis of Hallström & Janke suggests that mammalian evolution may not be strictly bifurcating.
My interest in this blog post is in investigating the relative roles on the data and the substitution models in producing the phylogenetic trees. I use splits graphs of the recent data (using the SplitsTree program) as an exploratory data analysis, to visualize the signals in the datasets and which trees they might support under different circumstances.
Any phylogenetic analysis depends on the quality of the data, in terms of the sampling of both taxa and characters. Both Morgan et al. and Romiguier et al. used the protein-coding sequences for most of the 40 currently available mammalian genomes.
However, it is worth noting at the outset that the sampling of the root taxa is rather poor. The root involves the relative relationships of the Xenarthra and the Afrotheria, and yet there are only two sampled Xenarthra species and three sampled Afrotheria (the remaining taxa are split between the Laurasiatheria and Euarchontoglires). Perhaps we are asking too much in expecting these data to resolve the root at all.
We can start the investigation with the data of Morgan et al., based on the concatenated amino acid sequences. The first NeighborNet analysis uses the simplest model possible, the hamming distance (which is simply the number of alignment differences between the taxa). I have colour-coded the four taxonomic groups, for convenience.
Note that all four taxaonomic groups appear to be monophyletic (ie. they are each supported by a unique split), as also is the Xenarthra+Afrotheria group. However, the raw data attach the outgroup to the placental group away from both the Xenarthra and the Afrotheria. Indeed, the data suggest that the Insectivora (Sorex+Erinaceus) are candidates as the sister to the rest of the placentals.
The effect of the substitution model on the data analysis can be evaluated by including a more sophisticated genetic distance. I have chosen the JTT amino-acid model, with the inclusion of a proportion of invariant sites (estimated by SplitsTree to be 30%). The corresponding NeighborNet is shown in the second graph.
This network attaches the outgroup near the "expected" taxa (Xenarthra, Afrotheria), although the location of Sorex is rather problematic. However, the split supporting the group Xenarthra+Afrotheria as the sister to the rest of the placentals is still very small, being ranked only 28th of the 82 non-trivial splits that involve at least one placental species. So, even this simple model does not provide strong support for the root location. However, it seems obvious that the root location is being determined as much by the substitution model as by the data, suggesting that the data cannot provide convincing evidence alone.
We can now proceed to study the data of Romiguier et al., based on the maximum-likelihood gene trees (GTR+GAMMA model) from the 560 genes, rather than the original alignment data. Here I have used a Consensus Network that displays all of those splits occurring in at least 24% of the trees. This percentage is the smallest that produces only a single reticulation in the network.
So, the most ambiguous part of the set of trees (ie. where there is most conflict among the trees) turns out to be where the outgroup attaches to the placental group. This is hardly surprising. What is more interesting is that the split support for each of the three alternative attachment points is very similar:
So, the gene-tree data do not favour any one of the three alternative placental roots.
It is clear from these exploratory analyses that the genomic data do not, on their own, provide conclusive evidence regarding the root of the placental clade. The approach of Morgan et al. and Romiguier et al. has been to use a tree model based on sophisticated substitution models, thus arriving at conclusions that depend as much on their models as on the data. They used different models and got different trees, based on roughly the same data.
This is one approach to phylogenetics, to use more sophisticated models; but an alternative is to recognize that evolution itself is sophisticated, and therefore does not necessarily produce a dichotomous tree. In this case, it seems more likely that the conflicting signals at the placental root reflect non-tree-like processes (such as hybridization), so that tree-based analyses are inappropriate, no matter how fancy the models are.
Hallström, Janke (2010) Mammalian evolution may not be strictly bifurcating. Molecular Biology & Evolution 27: 2804-2816.
Meredith et al. (2011) Impacts of the Cretaceous terrestrial revolution and KPg extinction on mammal diversification. Science 334(6055): 521-524.
Morgan et al. (2013) Heterogeneous models place the root of the placental mammal phylogeny. Molecular Biology & Evolution 30: 2145-2156.
Romiguier et al. (2013) Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals. Molecular Biology & Evolution 30: 2134-2144.