Wednesday, September 4, 2013

Mis-interpreting phylogenetic trees


I have noted before that biologists have used various metaphors or models for phylogenetic relationships, including a chain, a tree, and a network. I, and other people, have also noted that interpreting the relationships shown by these structures is not always easy for novices, and sometimes even for experts (see Ambiguity in phylogenies).

A chain is 1-dimensional, and so interpreting its relationships is usually straightforward. However, a tree is not a simple linear concept, as it consists of a set of inter-linked chains. It is clear from the literature that a tree is a structure where many people find it easy to interpret relationships incorrectly. Here, I illustrate this with an example.

The example is taken from this paper:
L. Luca Cavalli-Sforza (1997) Genes, peoples, and languages. Proceedings of the National Academy of Sciences of the U.S.A. 94: 7719-7724.
The first illustration is Figure 1 from that paper.


In the text, the author also notes this about the figure:
The most important difference is in the position of Europe, which with neighbor joining branches out first after the splitting of Africans and non-Africans and with maximum likelihood [sic!] is the last but one.
This interpretation is incorrect, because it is the position of Oceania that differs between the two trees (not Europe), as shown in the illustration below.


In the original figure, tree (a) is rooted while tree (b) is unrooted. In order to directly compare them, we need to root the tree in (b), as shown in the first row of the illustration. Note that I have re-ordered the areas in Figure a, but I have not changed the relationships as shown by the tree. One of the most common mis-interpretations of trees is to think that the linear order (top to bottom) has some meaning (see Ambiguity in phylogenies), but it does not.

Then, in order to identify the difference between the pair of rooted trees, we simply delete each of the areas in turn, which is shown in rows 2 to 5 of my figure. Only the deletion of Oceania makes the two rooted trees identical (row 3). The deletion of Europe (row 2) does not do this, and so the position of Europe should not be identified as a key difference between the two trees.

The literature is replete with this sort of simple interpretational mistake concerning trees. The concern for those of us involved is:
What will it be like for people to interpret networks, which are sets of inter-linked trees?

No comments:

Post a Comment