Monday, September 23, 2019

Where are we, 60 years after Hennig?

Phylogenetic analysis is common in the modern study of evolutionary biology, and yet it often seems to be a poorly understood tool. Indeed, it seems to often be seen as nothing more than a tool, and one for which one does not need much expertise.

For example, we do not need to spend much time on Twitter to realize that many evolutionary biologists do not understand even the most basic things about the difference between taxa and characters. Taxa are often referred to as "primitive", particularly by people studying the so-called Origin of Life. However, taxa themselves cannot be either primitive or derived; instead, they are composed of mixtures of primitive and derived characters — they have derived characters relative to their ancestors and primitive ones compared to their descendants.

The logical relationship between common ancestors and monophyletic / paraphyletic groups is also apparently unknown to many evolutionary biologists. There is endless debate about whether the Last Universal Common Ancestor was a Bacterium or an Archaean when, of course, it cannot be either. That is, we sample contemporary organisms for analysis, which come from particular taxonomic groupings, and from these data we infer hypothetical ancestors. However, those ancestors cannot be part of the same taxonomic group as their descendants unless that taxonomic group is monophyletic.

This is all basic stuff, first expounded in the 1950s by Willi Hennig. So, why do so many people apparently still not know any of this 60 years later? I suspect that somewhere along the line the molecular geneticists got the idea that Hennig was part of Parsimony Analysis, and since they adopted Likelihood Analysis, instead, he is thus irrelevant.

However, Hennigian Logic underlies all phylogenetic analyses, of whatever mathematical ilk. All such analyses are based on the search for unique shared derived characters, which is the only basis on which we can objectively produce a rooted phylogenetic tree or network.

In the molecular world, many analysis techniques are based on analyzing the similarity of the taxa. However, similarity is only relevant if it is based on shared derived characters — if it is based on shared primitive characters then it cannot reliably detect phylogenetic history. This was Hennig's basic insight, and it is as true today as it was 60 years ago.

The confusing thing here is that most similarity among taxa will be based on both primitive and derived characters. This means that some of the analysis output reflects phylogenetic history and some does not. The further we go back in evolutionary time, the more likely it is that similarity reflects shared primitive characters rather than shared derived characters. This simple limitation seems to be poorly understood by evolutionary biologists.

Perhaps it would be a good idea if university courses in molecular evolutionary biology actually taught phylogenetics as a topic of its own, rather than as an incidental tool for studying evolution. After all, there is more to getting a scientific answer than feeding data into a computer program.

Obviously, I may be wrong in painting my picture with such a broad brush. If so, then it must be that the people I have described seem to have gathered on Twitter, like birds of a feather.

And yet, I see the same thing in the literature, as well. Consider this recent paper:
A polyploid admixed origin of beer yeasts derived from European and Asian wine populations. Justin C. Fay, Ping Liu, Giang T. Ong, Maitreya J. Dunham, Gareth A. Cromie, Eric W. Jeffery, Catherine L. Ludlow, Aimée M. Dudley. 2019. PLoS Biology 17(3): e3000147.
This seems to be quite an interesting study of a reticulate evolutionary history involving budding yeasts, from which the authors conclude that:
The four beer populations are most closely related to the Europe/wine population. However, the admixture graph also showed strong support for two episodes of gene flow into the beer lineages resulting in 40% to 42% admixture with the Asia/sake population.

However, they then undo all of their good work with this sentence:
The inferred admixture graph grouped the four beer populations together, with the lager and two ale populations being derived from the lineage leading to the Beer/baking population.
Nonsense! Neither lineage derives from the other, but instead they both derive from a common ancestor. This is like saying that I derive from the lineage leading to my younger brother, when in fact we both derive from the same parents. I doubt that the authors believe the latter idea, so why do they apparently believe the former?

That is a little test that you can all use when writing about phylogenetics. If your words don't make sense for a family history, then they don't make sense for phylogenetics either.

No comments:

Post a Comment