## Wednesday, September 26, 2012

### Networks and most recent common ancestors

The interpretation of an evolutionary network is confounded by the fact that descendants of reticulation nodes have complex ancestry. Therefore, the concept of a Most Recent Common Ancestor (MRCA) is not as straightforward as it is for a tree, as there may be multiple paths from any one descendant back to its ancestors. This creates several possible interpretations of what we might mean by a MRCA.

Figure 1 illustrates the calculation of the MRCA in a tree of five taxa (A-E), showing the MRCA of taxa C and D. We simply trace each of the descendant taxa backward along the branches towards the root, and the ancestral node where all of these traces first intersect is the MRCA of those taxa.

 Figure 1.

Figure 2 illustrates a more complex history, involving two hybridization events. The incoming branches to the reticulation nodes have arrows, for emphasis. The figure also recognizes several possible interpretations of the MRCA of taxa C and D (see Huson and Rupp 2008; Fischer and Huson 2010).

A conservative definition of the MRCA (or a stable MRCA) is the intersection of all paths from the descendants to the root, so that any reticulation pushes the MRCA back towards the root. In this example it pushes the MRCA all the way to the root. Alternatively, we could define the Lowest Common Ancestor (or the minimal common ancestor) as the shared ancestor that is furthest from the root along any path. That is, the LCA is not an ancestor of any other common ancestor of the taxa concerned.

 Figure 2.

In the mathematical terminology of lattices, which can have an algebraic or order theoretic definition, the Conservative MRCA is called the Least Lower Bound (LLB) and the LCA is called the Greatest Lower Bound (GLB).

We could also have a biological compromise between these two mathematical concepts and recognize a Fuzzy MRCA, in which only a specified proportion of the paths (representing some proportion of the genomes) needs to be accommodated by the MRCA, thus keeping the MRCA close to the main collection of descendants (Fischer and Huson 2010). In this example, the Fuzzy MRCA represents 75% of the genome of taxon C and 100% of the genome of taxon D. (The Conservative MRCA represents 100% for both taxa, by definition; and in this example the LCA represents 50% of the genome of taxon C and 100% of the genome of taxon D.)

 Figure 3.

However, neither the Fuzzy MRCA nor the LCA is necessarily unique, although the Conservative MRCA will always be unique. Figure 3 shows an example where there are two independent LCAs of taxa C and D. Neither of these LCAs is an ancestor of the other, as required by the definition, and so they are both equal candidates as LCA. Each one represents 50% of the genome for both taxa C and D.

In terms of a lattice, Figure 2 is called a lower semi-lattice (or meet semi-lattice), because every pair of nodes has only one GLB, whereas Figure 3 is not a semi-lattice, because at least one node pair has more than one GLB.

This leads to the biological question of how we are best to interpret the MRCA in situations such as that represented by Figure 3. This is a question that does not yet seem to have been addressed by biologists. Figure 3 does not represent an impossible evolutionary history, although it may be an unusual one because one lineage hybridizes with another lineage twice, presumably at different times.

The lack of a unique LCA is clearly problematic, as it almost defeats the purpose of the concept of a MRCA. It would certainly make life easier if we could restrict evolutionary networks to the class of lower semi-lattices.

An alternative is to restrict the MRCA concept to the Conservative MRCA. However, it is easy to imagine situations where this pushes the MRCA so far towards the root of the network as to be uninformative, especially in cases involving horizontal gene transfer, which can occur between widely separated evolutionary groups. If we insist that a eukaryote MRCA represent 100% of the genome, and we include non-nuclear genomes in the calculation, then the Conservative MRCA creates an extreme theoretical problem.

A Fuzzy MRCA may be the best compromise between these two extremes, although there are obvious practical issues for obtaining agreement on how much of the genome history is to be discounted from the MRCA.

References

Fischer J., Huson D.H. (2010) New common ancestor problems in trees and directed acyclic graphs. Information Processing Letters 110: 331–335.

Huson D.H., Rupp R. (2008) Summarizing multiple gene trees using cluster networks. Lecture Notes in Bioinformatics 5251: 296–305.

1. For some reason I can't load Figs. 2 & 3. May be a problem on my end.

At any rate, I discussed this sort of issue here: http://dx.doi.org/10.1111/j.1463-6409.2007.00302.x

In that paper I coined the term "cladogenetic set" for any taxon (set of organisms) wherein no members are ancestral to each other and all members share at least one common successor (where an organism's "successor" is itself or any descendant). Clade ancestors (including MRCAs) are cladogenetic sets.

Later I modified that the term to "cladogen" and defined it in terms of taxonomic units, rather than individual organisms: http://namesonnodes.org/ns/math/2009/index.html#section-Cladogens

2. Well, now the images are showing up. According to my treatment, the MRCA in Fig. 2 is the LCA and the MRCA in Fig. 3 is the union of both LCAs.

3. Thanks for your input, Mike. The issue of an MRCA in a directed network has, indeed, been discussed by yourself, from the computational point of view, along with a number of mathematicians, from a more abstract perspective. My interest is in how to get biologists to think about what they are going to do about this in practice. That is, what doe they really want an MRCA to be? They need to start thinking about it a bit more than they have been.

Your Names on Nodes project is an interesting, and necessary, step towards addressing the nomenclatural aspects of phylogenetic networks. I have also sometimes thought about drawing attention to your Phun Phylogenies blog post!.

4. Thanks, David! Certainly feel free to post on "Phun Phylogenies" -- I really enjoyed putting that one together.

"That is, what doe they really want an MRCA to be? They need to start thinking about it a bit more than they have been."

Second part, I definitely agree. The question is interesting, although I have to say my first inclination is to say, "Who cares what they want? Science is about what is."

Although I should modify that a little bit -- the maximal common ancestor is dependent on what the taxonomic units are, and what is considered "ancestry", and determining either of those is always at least a little bit subjective (if often almost imperceptibly so). So the personal judgment of the biologist is going to enter into it at some stage -- I'd just rather keep that to a minimum.