Monday, November 19, 2018

The curiously converted logic of phylogenetics


Phylogenetic analysis involves describing patterns, not studying processes. That is, we cannot conduct a manipulative experiment to study evolutionary history. All we can do is collect naturally occurring data, and then try to detect relevant patterns in it. Thus, in a descriptive study we investigate processes by examining the patterns they produce, not by manipulating the processes themselves, which is what we would do in an experimental study.

Obviously, one of the limitations of this procedure is that the patterns we need may not be in the data we have at hand. It is this limitation that leads some scientists to claim that descriptive studies are not part of science. However, this is not the majority view.


Equally importantly, there is a logical limitation to descriptive studies, as well, which I have rarely seen mentioned. In the world of logic, propositions cannot be converted; and yet converting propositions is exactly what is done by all descriptive analyses. [The four terms used in logic are defined at the bottom of this post.]

Our initial logic works from process to pattern (if p, then q), but we interpret it the other way around, that a specified pattern must be created by a particular process (if q, then p). Thus:
  • we expect this specific process to produce that particular pattern
  • therefore, when we see that particular pattern we can infer this specific process.
The problem here is the second statement, which is the logical converse of the first statement (the proposition). The inference is illogical, because other processes might also create the same pattern, in which case our inference can be wrong.

The Monty Python comedy team had a go at this in their Logician skit on "The Holy Grail" album (but not in the movie of the same name). Their example concerned a 1950s-60s singer called Alma Cogan, who died in 1966. Their inference was:
  • all of Alma Cogan is dead
  • therefore, all dead people are Alma Cogan.
This is illogical, because there is more to being dead than simply being Alma Cogan — logical propositions can be only partially converted.

The same logical fallacy has also been pointed out in the application of statistics to ecology. Stuart Hurlbert (1990. Spatial distribution of the Montane Unicorn. Oikos 58: 257-271) assessed the use of the poisson probability distribution as evidence for random spatial distributions of organisms. The inference is:
  • for a poisson distribution, the variance equals the mean
  • therefore, if the variance equals the mean we can infer a poisson distribution.
His paper points out many real datasets where the variance equals the mean but the data do not fit a poisson distribution. He concluded: "Each population showed a different pattern of aggregation and none corresponded to a Poisson distribution. The variance:mean ratio is useless as a measure of departure from randomness, though it is widely recommended as such."

These are simply examples of a general problem: we cannot convert a proposition and expect to be right all of the time, or even most of the time. The issue applies to all phylogenetic analyses, whether they involve the assessment of homology, or the construction of trees and networks — we are inferring particular evolutionary processes form the observation of particular patterns in our data. For example, our model of the process of speciation implies a tree model of evolution, and therefore every time we get a "well-supported tree" we treat it as the true phylogeny. This will not work if other processes are occurring, such as hybridization.

I will finish with one specific example from network analysis. The D-statistic is used in the so-called ABBA-BABA test for detecting introgression among taxa (see Networks of admixture or introgression). The logic works from process to pattern (introgression would create a particular gene-tree pattern), but we interpret it the other way around — we see the specified gene pattern and we thereby infer the presence of introgression.

This issue of illogic is definitely a limitation of phylogenetic analysis.



The terms of logical analysis:
Proposition
Inverse
Converse
Contrapositive
if p, then q
if not p, then not q
if q, then p
if not q, then not p