Monday, September 17, 2018

Getting the wrong tree when reticulations are ignored


One issue that has long intrigued me is what happens when someone constructs a phylogenetic tree under circumstances where there are reticulate evolutionary events in the actual (ie. true) phylogeny itself. That is, a network is required to accurately represent the phylogeny, but a tree is used as the model, instead. How accurate is the tree?

By this, I mean that, if the phylogeny can be thought of as a "tree with reticulations", do we simply get that tree but miss the reticulations, or do we get a different (ie. wrong) tree?


Sometimes, people refer to this situation as having a "backbone tree" — the phylogeny is basically tree-like, but there are a few extra branches, perhaps representing occasional hybridizations or horizontal gene transfers. The phylogenetic tree can then be treated as a close approximation to the true phylogeny, representing the diversification events but not the (rarer) reticulation events.

I have argued against this approach (2014. Systematic Biology 63: 628-638.). Instead of seeing a network as a generalization of a tree, we should see a tree as a simplification of a network. If we do this, then we would construct a network every time; and sometimes that network would be a tree, because there are no reticulation events in the phylogeny. It cannot work the other way around, because we can never get a network if all we ask for is a tree!

Presumably, if there are no reticulations then we should get the same answer (phylogenetic tree) irrespective of whether we simply construct a tree or instead construct a network that turns out to be a tree. But what about the "backbone tree" situation? Here, it has always seemed to me to be possible that we do not get the same tree. If this is so, then constructing a tree and then adding a few reticulations to it (as is often done in the literature) would not work — we would be adding reticulations to the wrong backbone tree.

There are two possible ways in which we can get the wrong backbone tree: the topology might be incorrect, or the branch-lengths might be incorrect (or both). For example, if there are true reticulations and yet we do not include them in our model, I have argued that the branches will be too short (2014. Systematic Biology 63: 847-849.) — two taxa will be genetically similar because of the reticulation events, but the tree-building algorithm can only make them similar on the tree by shortening the branches (not by adding a reticulation).

Fortunately, for at least one tree-building model Luay Nakhleh and his group have now done some simulations to answer my questions. You may not yet have noticed their results, because they are not necessarily in the most obvious place; so I will highlight them here. The analyses involve the Multispecies Coalescent (MSC) model, which accounts for incomplete lineage sorting during the tree-like part of evolution, as compared to the Multispecies Network Coalescent (MSNC) which adds reticulations (eg hybridization) to the model.

1.
Dingqiao Wen, Yun Yu, Matthew W. Hahn, Luay Nakhleh (2016) Reticulate evolutionary history and extensive introgression in mosquito species revealed by phylogenetic network analysis. Molecular Ecology 25: 2361-2372.

This paper compares a tree-based analysis (construct a tree first then add reticulations) with a network-based analysis (construct a network) for an empirical genomic dataset. The two results differ.

2.
Dingqiao Wen, Luay Nakhleh (2018) Coestimating reticulate phylogenies and gene trees from multilocus sequence data. Systematic Biology 67: 439-457.

Tucked away in the Supplementary Information are the results of a set of simulations comparing the MSC (using *Beast) and the MSNC (using PhyloNet), with (section 3) and without (section 2) reticulations. The basic conclusion is that, in the presence of reticulation, tree-based methods either get the tree completely wrong, or they get the tree topology right but the branch lengths are "forced" to be very short. A summary of the latter result is shown in the figure above. In the absence of reticulation, both methods produce the same tree.

3.
R.A. Leo Elworth, Huw A. Ogilvie, Jiafan Zhu, and Luay Nakhleh (ms.) Advances in computational methods for phylogenetic networks in the presence of hybridization. (chapter for a forthcoming book]

A summary of the group's work to date. Section 6.3 summarizes the results from the paper 2.