Monday, January 21, 2019

A question about coalescent-based species phylogenies


This may be a naive question; but as I am now semi-retired, so I can now ask it without professional embarrassment.

It is common when constructing species phylogenies (both trees or networks) to use a model that takes into consideration multiple replacements of characters through evolutionary time. If the states of any given character have been modified multiple times, then the currently observed differences in that character between taxa will not accurately reflect their evolutionary history.

For example, we "correct for multiple substitutions" when using DNA/RNA sequence data. We do this because, with only four character states, the probability that undetectable multiple substitutions have occurred increases considerably through evolutionary time. So, we have developed any number of sophisticated models for addressing this issue, such as JC and GTR; and it is unusual to see a published paper with a species phylogeny that does not use one of them.

This leads to a question about population phylogenies. In this case, the use of the coalescent model is prevalent. It allows the calculation of various population parameters, based on viewing phylogenies backwards through time. For the purpose of phylogenetics, the key calculation is the coalescence time of each pair of lineages, although population size is also of some interest.

The coalescent model is based on a set of assumptions, of course. Indeed, it is based on the Fisher-Wright model of population genetics. This is an infinite-sites model, meaning that it assumes that multiple replacements of characters do not occur during the evolution of the populations. That is, if the genetic sequences are infinitely long then the probability of multiple substitutions is 1 / infinity = zero.

This, then, is my question: Can we really assume that multiple substitutions never occur, in one part of the analysis, and assume that they are so common that we need to adjust for them, in another part of the same analysis?

I have not found this issue addressed either in the published literature or on the internet. Indeed, most people I have spoken to did not even realize that the coalescent is ultimately based on an infinite-sites model. So, for me at least, this is an interesting question.