Wednesday, April 30, 2014
Reconstructing ancestors in a splits network?
A splits graph is an unrooted phylogenetic network (see How to interpret splits graphs). However, sometimes they are treated as being rooted networks, and under these circumstances it is assumed that they therefore represent a phylogeny. Nevetheless, it is important to recognize that a rooted splits graph does not explicitly represent a phylogeny, because reticulations in the graph represent uncertainty not genealogy (see How do we interpret a rooted haplotype network?).
A corollary to this is that reconstructing "ancestors" in a splits graph is problematic. The nodes do not necessarily represent inferred ancestors, because their actual role is to support the corners of the parallelograms formed by intersecting sets of parallel edges in the graph. Some of the nodes may, indeed, represent ancestors but there is no way to determine this from the network itself.
Let's look at a specific example, taken from the paper by J. Miguel Díaz-Báñez, Giovanna Farigu, Francisco Gómez, David Rappaport & Godfried T. Toussaint (2004) El Compás flamenco: a phylogenetic analysis. Proceedings of BRIDGES Conference: Mathematical Connections in Art, Music and Science, pp. 61-70.
The authors provide an analysis of the hand-clapping patterns of the flamenco music of Andalucia, in southern Spain. There are four recognized patterns, plus the fandango pattern, and the authors use two different distance measures to assess their rhythmic similarities. They produce unrooted phylogenetic networks based on each of these distances, which turn out (on reanalysis of their data) to be NeighborNets (the authors refer to them as "SplitsTrees").
The authors ignore the fact that "it is well established that the fountain of flamenco music is the fandango", which would make the fandango the outgroup for rooting if we did wish to treat the networks as rooted. Instead, they try to "reconstruct the 'ancestral' rhythms corresponding to the nodes" by using mid-point rooting. This procedure can easily be applied to unrooted phylogenetic trees, but its application to networks is problematic because there are multiple paths through the graph, and there may thus be several points that qualify as the mid-point.
For one of their networks, shown in the first graph, the authors identify the single mid-point, and then try to reconstruct "the ancestral rhythm closest to the 'center'", based on the node closest to the mid-point. They do this by "trial and error", based on the distances from the identified "ancestral" node to all of the leaves. That is, they find the hand-clap pattern that has the required distance to all of the leaves.
The authors do not tackle this procedure for their other network. In this case, as shown in the next graph, there are two mid-point locations. While there is a node that is equally close to these two locations, which might therefore qualify as "the ancestral rhythm closest to the center", it is difficult to reconstruct its actual rhythm.
Finally, if we try to identify the "ancestral rhythm" using the node identified by the fandango outgroup, then the result is dramatically different to that produced by the mid-point method, for both networks.