Friday, March 2, 2012

Can networks have multiple roots?

The biological model behind most phylogenetic networks is the same as the one behind most phylogenetic trees, in which there is a series of branches ramifying from a single base, with the additional feature that branches can fuse with each other.

In this model, attention has focussed on the osculations ("kissing") between branches. However, I wish to draw your attention to the base of the tree, where in some biological models multiple stems appear. These stems represent multiple origins for the organisms being modelled.

The idea is, simply, that life is not monophyletic, and nor are some of the commonly recognized taxonomic groups. This model appears most famously in the paper by Doolittle (1999), but it's basic premise has been repeated a number of times (eg. Doolittle 2000a, from which the above figures are taken; Wells 2002).

Doolittle (2000b) credits the biological idea to Woese & Fox (1977), as further developed by Woese (1987, 1998), so the idea is not a particularly recent one. The premise is that "... the three contemporary domains of life arose not from a single cell, but from a population of very different cellular entities ('progenotes') ... such a population [could] give rise to two (and then three) discrete cellular domains without passing through a bottleneck represented by a single cellular universal ancestor" (Doolittle 2000b).

There is, of course, a biological precedent for this multiple tree model: the "Husband and Wife tree" or "Marriage tree", which is formed from two trees that have branches conjoined by the process known as self-grafting (or osculation). Here, there literally are two trunks and roots, since the conjoined structure starts as two separate trees.

Inosculated (self-grafted) crab apple trees, Lynncraigs farm, Scotland

My question, though, is this: Can the mathematics of phylogenetic networks handle multiple roots? All current definitions that I have seen of phylogenetic networks specify a single root node with indegree 0. However, I have seen no discussion of this point in the literature, as to the necessity of this imposed mathematical constraint.


Doolittle W.F. (1999) Phylogenetic classification and the universal tree. Science 284: 2124-2128.

Doolittle W.F. (2000a) Uprooting the tree of life. Scientific American 282(2): 90–95.

Doolittle W.F. (2000b) The nature of the universal ancestor and the evolution of the proteome. Current Opinion in Structural Biology 10: 355-358.

Wells J. (2002) Icons of Evolution: Science or Myth? Regenery Publishing, Washington DC.

Woese C.R. (1987) Bacterial evolution. Microbiological Reviews 51: 221-271.

Woese C.R. (1998) The universal ancestor. Proceedings of the National Academy of Sciences of the USA 95: 6854-6859.

Woese C.R., Fox G.E. (1977) The concept of cellular evolution. Journal of Molecular Evolution 10: 1-6.


  1. David, this is a very good question and one that it seems not many people have tackled. What do you think? Do species arise from multiple origins?

  2. Mathematically it is an interesting question too. My first instinct is that the multiple-root situation can probably be reduced to the single-root situation by having some kind of high-degree (i.e. unrefined) artificial root which is the parent of the real roots. Although there will almost certainly be technical complications...

  3. My current answer to the biological question is probably somewhat similar to Steven's comment on the mathematical one. That is, it is in some ways a technical question about whether our analyses can be made to still work even if there are multiple roots. For example, hybrid species can be seen as having multiple origins at one level, because they have two parents whose common ancestor may be a long way back in time. This is even more so for the products of HGT. However, we can also treat both examples as having a single origin, because they do (presumably) share a common ancestor somewhere back in time. So, perhaps it does not actually matter whether we think species or any other taxa have multiple origins in the proximate sense, because we only have to assume that there is a single common ancestor in the ultimate sense. This provides a biological match to Steven's artificial mathematical root, allowing the analysis to conceptually involve a monophyletic group, as in a traditional tree-based analysis.

    This begs the interesting biological question about the possibility of multiple origins, of course, except at the origin of life. If I try to answer the direct question "Do species arise from multiple origins?", then I guess my answer would be: "why not?" Most models of speciation, of course, assume that this possibility is very rare. However, it seems to me that biology always turns to be more complicated than we initially think it is, so it might be naive to ignore the possibility. I suspect that this is one reason for the recent interest in haloptype networks, which look for reticulate relationships at the species level rather than simple hierarchical ones.

  4. There is no mathematical constraint. Mathematically, start node does not have to be singular. Any start node also does not have to be at a particular level (alien genesis in 2015 would work). Also, 2 or more Father(s)/Mother(s) can exist. However, inferences that use a constraint of two will no longer work and analyses and algorithms that require a start node of 1 will not work. The math just gets a little more complicated is all.