The Genealogical World of Phylogenetic Networks: Open questions about evolutionary networks, part 3

There are a number of issues that have been of interest to the phylogenetics community with regard to the construction of evolutionary trees that have not yet been addressed for evolutionary networks. These can be considered to be "open questions" — ones that need widespread discussion at some stage, either by biologists or by computational scientists (or both). This blog post finishes my list of some of these topics (see Part 1 and Part 2).

Robustness of branch/reticulation estimates

It is de rigueur in the world of phylogenetic tree building to pepper the tree branches with bootstrap values or posterior probabilities, or frequently both, especially if these estimates are >50%. On the other hand, these values are almost never seen in the world of phylogenetic networks.

If there is a direct link between the network and some character-state data, then bootstrap values can be calculated for a network in the same manner as for a tree — one simply builds many networks from the re-sampled character data. However, this procedure may not be quite as computationally feasible, if the network method does not have a practical computational running time.

Moreover, this procedure is not necessarily straightforward for other types of data from which we might build a network. For example, if we are building a network by minimizing the number of reticulations needed to reconcile a set of conflicting trees, the application of the bootstrap has not yet been evaluated. The computational focus to date has been on the optimization problem, not on the re-sampling problem. And, of course, in the absence of a likelihood model for reticulation events, posterior probabilities cannot be calculated at all.

So, this is another area where the lack of methods commonly associated with tree building seems to be a handicap for the widespread acceptance of network-based methodology.

Can biologists correctly interpret networks?

I have used this quote in an earlier blog post, but it is relevant again here. Baum and Smith (2012, Tree Thinking: An Introduction to Phylogenetic Biology) have noted the following:

"We do not know why it should be so, but we have learned from working with thousands of students that, without contrary training, people tend to have a one-dimensional and progressive view of evolution. We tend to tell evolution as a story with a beginning, a middle, and an end. Against that backdrop, phylogenetic trees are challenging; they are not linear but branching and fractal, with one beginning and many equally valid ends. Tree thinking is, in short, counterintuitive."

This is a well-studied problem. For example, there have been a number of studies of students taking introductory biology courses at tertiary institutions (mostly in the U.S.A.), aimed at identifying the "major misconceptions" entertained by these students. Certain basic problems are discussed by almost all of the authors concerned (both inside and outside the USA). I have written more extensively on this topic in a post at the Scientopia blog (Ambiguity in phylogenies), which you can read if you are unfamiliar with the current state of affairs. That blog post lists most of the important issues as well as the available literature.

That evolution professionals often suffer the same sort of problem is also well known. I have written more extensively on this topic in a previous post at this blog (Evolutionary trees: old wine in new bottles?). This blog post also lists the relevant literature.

What is worse, some professional organizations apparently know no better. For example, the Federation of American Societies for Experimental Biology (FASEB), which describes itself as "the policy voice of biological and biomedical researchers" in the U.S.A., has this Advocacy Card on their web site:

FASEB was also giving away similar bumper stickers at the recent 20th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB — July 2012, in Long Beach, CA), as discussed at the Byte Size Biology blog. Clearly, this image confounds linear evolution with tree-based evolution — this distinction is crucial to phylogenetic analysis, and yet confusion about these two things is rampant.

This leads me to an obvious question: if people have so much trouble going from a linear view of evolution to a tree-based view, are they going to have even more trouble going to a network-based view?

I cannot answer this question (yet). At one extreme, maybe the big conceptual leap is going from a chain to a tree, and a network is just a complicated tree, so that the conceptual leap is not great. Alternatively, maybe a tree is difficult because it is a set of linked and overlapping chains, and therefore a network is very difficult because it is a set of linked and overlapping trees. Maybe reality will turn out to be somewhere in between these two extremes.

There are at least two issues that are likely to be of importance here, in addition to those concerned with trees:

it is difficult to recognize monophyletic groups (clades) in a network, because the ancestry of any one taxon may be complicated (eg. what is a Most Recent Common Ancestor in a reticulated network? — see this blog post);
it is difficult to distinguish the different possible causes of reticulations (recombination, hybridization, HGT).

We will presumably find out how difficult things are after we have developed a set of widely used methods for constructing evolutionary networks.