Showing posts with label Limitation. Show all posts
Showing posts with label Limitation. Show all posts

Friday, March 2, 2012

Can networks have multiple roots?


The biological model behind most phylogenetic networks is the same as the one behind most phylogenetic trees, in which there is a series of branches ramifying from a single base, with the additional feature that branches can fuse with each other.



In this model, attention has focussed on the osculations ("kissing") between branches. However, I wish to draw your attention to the base of the tree, where in some biological models multiple stems appear. These stems represent multiple origins for the organisms being modelled.


The idea is, simply, that life is not monophyletic, and nor are some of the commonly recognized taxonomic groups. This model appears most famously in the paper by Doolittle (1999), but it's basic premise has been repeated a number of times (eg. Doolittle 2000a, from which the above figures are taken; Wells 2002).


Doolittle (2000b) credits the biological idea to Woese & Fox (1977), as further developed by Woese (1987, 1998), so the idea is not a particularly recent one. The premise is that "... the three contemporary domains of life arose not from a single cell, but from a population of very different cellular entities ('progenotes') ... such a population [could] give rise to two (and then three) discrete cellular domains without passing through a bottleneck represented by a single cellular universal ancestor" (Doolittle 2000b).

There is, of course, a biological precedent for this multiple tree model: the "Husband and Wife tree" or "Marriage tree", which is formed from two trees that have branches conjoined by the process known as self-grafting (or osculation). Here, there literally are two trunks and roots, since the conjoined structure starts as two separate trees.

Inosculated (self-grafted) crab apple trees, Lynncraigs farm, Scotland

My question, though, is this: Can the mathematics of phylogenetic networks handle multiple roots? All current definitions that I have seen of phylogenetic networks specify a single root node with indegree 0. However, I have seen no discussion of this point in the literature, as to the necessity of this imposed mathematical constraint.

References

Doolittle W.F. (1999) Phylogenetic classification and the universal tree. Science 284: 2124-2128.

Doolittle W.F. (2000a) Uprooting the tree of life. Scientific American 282(2): 90–95.

Doolittle W.F. (2000b) The nature of the universal ancestor and the evolution of the proteome. Current Opinion in Structural Biology 10: 355-358.

Wells J. (2002) Icons of Evolution: Science or Myth? Regenery Publishing, Washington DC.

Woese C.R. (1987) Bacterial evolution. Microbiological Reviews 51: 221-271.

Woese C.R. (1998) The universal ancestor. Proceedings of the National Academy of Sciences of the USA 95: 6854-6859.

Woese C.R., Fox G.E. (1977) The concept of cellular evolution. Journal of Molecular Evolution 10: 1-6.

Monday, February 27, 2012

A fundamental limitation of hybridization networks?


In a "hybridization" network, reticulation cycles with three or fewer outgoing arcs are not uniquely defined with respect to trees, clusters or triplets. This point was first noted by Gambette and Huber (2009), although this work will not be formally published until later this year (Gambette and Huber 2012). This seems to be a fundamental mathematical limitation of such networks, which thereby limits what biologists can expect to achieve by performing a network analysis. It is thus a very important point for biologists to understand, as it currently can lead to incorrect interpretation of phylogenetic networks.


The figure shows two incompatible inputs and the three networks resulting from a hybridization model. The inputs are shown in the figure as trees, triplets and clusters, since in this example these three are identical. There are three taxa (labeled A, B, C), which form two triplets (labeled 1, 2), as shown. (The third possible triplet is not part of this discussion.) Obviously, these triplets also represent two trees, and those trees have two non-trivial clusters.

The figure also shows the three networks (labeled a, b, c) that are encoded (uniquely described) by these triplets / trees / clusters. The relevant arcs of the networks that must be deleted to induce each triplet / tree / cluster are labeled (i.e. deleting edge 1 induces triplet / tree / cluster 1, and similarly for edge 2).

These three networks each have a single reticulation cycle with a single reticulation node (i.e they are level-1 networks) and three outgoing arcs. Note that the three networks differ only in the direction of two of their arcs. Note, also, that the fourth possible combination of these two arcs produces a graph with two roots, which is invalid as a phylogenetic network.

So, these three networks are all associated with the same trees, clusters and triplets. In practice, this means that any one of taxa A, B or C can be attached to the reticulation node. Any network containing such a cycle is not unique – we cannot mathematically distinguish between the three different cycle topologies.

In one sense, this indistinguishability is a mathematically "trivial" ambiguous case. However, this should not make it an under-valued point, because it is likely to have enormous impact on the biological interpretation of networks. After all, every hybridization or horizontal gene transfer potentially creates a reticulation cycle with three outgoing arcs. For example, hybridization between sister taxa will create this situation, although hybridization between non-sister taxa may not (as shown below). When this situation does occur, it will be difficult for us to identify the affected taxa from the network topology alone. This is one fundamental mathematical limitation of using trees (or their subsets such as triplets and clusters) to construct networks.


What is even worse, current computer implementations usually output only one network solution (see Albrecht et al. 2012). If a computer program outputs only a single one of a set of optimal networks, then this may be very misleading. In the case discussed here there are three optimal networks, and biologists might identify the wrong taxon as being the hybrid, depending on which of the three equal networks the program chooses to output. This is an unacceptable situation; and the set of all optimal networks must be produced by each algorithm.

Finally, we may need other (biological) criteria for determining the reticulation taxon. For example, the three networks above represent three different biological scenarios. In scenarios "b" and "c", a daughter taxon apparently hybridizes with its parent taxon, whereas in scenario "a" two daughters hybridize. In other words, temporal order may be deemed to be violated in "b" and "c", thus potentially eliminating them as candidate scenarios. We need, however, to be careful about using this type of argument, as it has not previously been necessary in phylogenetics.

References

Albrecht B., Scornavacca C., Cenci A., Huson D.H. (2012) Fast computation of minimum hybridization networks. Bioinformatics 28: 191-197.

Gambette P., Huber K.T. (2009) A note on encodings of phylogenetic networks of bounded level. Unpublished ms at: arXiv:0906.4324v1. Tue 23 Jun 2009.

Gambette P., Huber K.T. (2012) On encodings of phylogenetic networks of bounded level. Journal of Mathematical Biology [in press].