Monday, January 6, 2020

Why we may want to map trait evolution on networks, pt. 1 – Introduction

One of the more interesting aspects of studying evolution is to trace the evolution of the traits possessed by the organisms, whether those traits are physical or not (such as languages). That is, we usually infer phylogenetic trees and networks to see how things evolve, including both the organisms and their characteristics. However, this can easily lead to circular reasoning, as I will discuss here.


A phylogenetic tree may be enough to work out who is sister to whom. However, when thinking about evolution itself, we actually want to find out who comes from whom, instead. This may be the reason why Charles Darwin did not title his 'abstract' A Natural Order of Species but used instead The Origin of Species.

The tricky bit is this: in order to find the origin, we first need to establish ancestor-descendant relationships, so that we can then see how things like fossils fit in (ie. whether they represent extinct sister lineages or precursors of the modern-day taxa). When taxon B is a derivation of A (ie. B evolved from A), the character suite of A is not only primitive but also the original set. Now, let's assume that we have a third fossil taxon C, which is clearly related to A and B. As evolutionary biologists, we cannot be content merely with inferring sister relationships between A, B, and C, but instead we need to decide whether or not C is also descended from A.

Ironically (from a modern cladistic viewpoint, focusing on establishing sister-clade relationships), Willy Hennig provided us with some tools for doing this:
  • We all know that apomorphies are derived traits, either unique (aut-) or uniquely shared by a group with inclusive common origin (syn-). Aut- and syn-apomorphies define (Hennig's) monophyletic groups. According to Hennig, a synapomorphy is a necessary criterion for recognizing a monophyletic group, and also a sufficient one (although the latter easily falls prey to circular reasoning). More importantly, they tell us that the ancestor(s) of the group and its (potentially lost) sister lineages lacked this trait!
  • Sym-plesiomorphies are traits that are primitive within a certain lineage. They define paraphyletic groups, which are groups of exclusive common origin. Following Hennig, they need to be discarded for systematics. From an evolutionary viewpoint, however, symplesiomorphies have double information content: (a) they provide us with traits that, at some point back in time, were synapomorphies; and (b) any member of a lineage not carrying the symplesiomorphic trait, shows a derived one.
Farris' cladistics is still the basis for systematics, and is widely applied in phylo-paleontology. The initial flaw of this approach is to assume that we can use morphological traits to infer a tree (with parsimony), and then map the same traits onto the inferred tree, allowing us to qualify the traits towards Hennig's objectives. How can this not be circular reasoning? We are mapping the traits onto a tree derived from those traits in the first place, so that the tree-building and mapping are not independent.

A simple (?) real-world example

For the purpose of this exercise, we will take the minute (seven character) matrix of Wilf et al. (2019) from this previous blog post (characters 5 and 7 corrected, and missing Fagaceae added; see also Denk et al. 2019, Science, 10.1126/science.aaz2189).

Wilf et al. found an Eocene fossil in South America, and argued that it must be a member of the modern genus Castanopsis, based on a parsimony DNA-scaffold approach (without actually using a DNA partition). Being a member of a modern genus, the fossil should have some aut-/synapormphies or at least symplesiomorphies or homoiologies characterizing its sublineage of the Fagaceae, the paraphyletic Castanoideae.

Based on the morphology, we can infer this tree:

Fig. 1 – Adams consensus tree of 3 most-parsimonious trees (11 steps, CI = 0.84, RI = 0.88), traits are mapped using Mesquite's default parsimony model. Castanopsis rothwellii is the Eocene fossil found by Wilf et al.

Two characters qualify as near-synapomorphies (effectively there is only one: hemispheric indehiscent cupules) that define a crown-clade including Lithocarpus, Notholithocarpus, Castanopsis (as part of intrageneric variation) and Quercus. Most other putatively derived traits within the Fagaceae subclades are symplesiomorphies; two are potential homoiologies, one defining the Castanea-Chrysolepis clade. [Note the staircase-like tree topology, a common feature of parsimony trees dealing with extinct lineages.] The fossil's character suite is relatively derived, characters 6 (shared only with some Castanopsis) and 7 (reversal as in Quercus) could be interpreted as an extinct side lineage of the (paraphyletic) Castanoideae.

This is not a bad analysis for seven characters, but it is likely to be quite wrong.

Fagaceae still exist today, and their DNA can be sampled. Below is a maximum-likelihood tree, based on a 2012 NCBI GenBank oligogene data harvest I did for a talk in Bordeaux — the alignment is 19,242 basepairs long, has 2,985 distinct alignment patterns and a gappyness of 35.8%. Each genus and major intra-generic lineage is represented by a strict consensus sequence based on all available data (checked for mislabeled or pseudogene accessions). [Oaks started to radiate > 50 Ma, Grímsson et al. 2015, Hipp et al. 2019; beeches about the same time, Denk et al. 2009, Renner et al. 2016.]

Fig. 2 – a ML tree based on strict genus/intrageneric consensus sequences (see also Oh & Manos, 2008, fig. 4, based only on data from the Crabs Claw gene, CRC; fig. 5 in the same paper shows a combined CRC + ITS tree)

According to this analysis, Chrysolepis and Castanea are not sisters; Castanea, but not Lithocarpus, is a close relative of the oaks. The (monophyletic) Trigonobalanoideae should form a clade (Fig. 2) not a grade (Fig. 1).

The analysis is not circular anymore, when we infer a tree based on data that is, as far as we know, independent of the data we want to map onto the tree. With the invention of stochastic mapping methods, we also avoid the possible limitations of parsimony when it comes to character mapping — morphological evolution is often not parsimonious, at least for the traits we can observe back in time or study in detail today.

Fig. 3 – ML trait mapping on the tree in Fig. 2 (ie. considering molecular branch lengths). Note, the reconstruction of character state for the all-ancestor are ambiguous due to the extreme genetic distance between Fagus and the remainder of the Fagaceae. The situation in the scored fossils (Wilf et al. 2019, Denk et al. 2019) are shown for comparison.

For the ML mapping above, I scored intra-generic variations as additional states (ML ancestral-state reconstruction as implemented in the Mesquite program needs defined tips) and applied Mesquite's default model — this is essentially Lewis' Mk model for multi-state standard characters: one substitution category for any possible mutation. We can now compare the two mappings.

What our morphology-based tree recognized as derived was actually partly primitive. The near-synapomorphy (hemispheric indehiscent cupules) is in fact a symplesiomorphy of all Castanoideae + Quercus. Traits shared by Castanea-Castanopsis (pro parte, ie. some species show the ancestral, others the derived state) and Quercus are primitive, while those unique to (or part of intra-generic variation) one or several Castanoideae are derived.

Note that the alleged crown-group but old fossil Castanopsis rothwellii would fit at the base of the (core Fagaceae) tree (zero conflict) as well as close to its leaves (at least one conflicting character). Six of the seven traits can be pinpointed for the core Fagaceae ancestor. According to the reconstruction, it had three styles, scaly cupule appendages, hemispheric indehiscent cupules (vs. valvate in C. rothwellii), one flower per cupules, no valve dehiscence ("partial" in C. rothwellii), and inflorescences were unisexual and mixed (Wilf et al. state the Eocene fossils were unisexual, although the difference can only be assessed when investigating all inflorescences on a tree, see Denk et al.'s comment). The reconstruction is ambiguous regarding whether female flowers were clustered or solitary.

However, there is one implicit assumption held in common by all of the methods, including DNA-scaffolding, probabilistic and stochastic character mapping, total evidence dating, evolutionary placement algorithm (EPA) as implemented in the RAxML program, etc. That is: the inferred molecular tree is the true tree. This is the second fundamental flaw of cladistic approaches to evolution, as I will show in Part 2.

Data information

The morphological data used here is based on an emeneded version of the Wilf et al. matrix provided by my former colleague and co-author Thomas Denk (see also Denk et al. 2019, table 1); and it can be, together with the molecular data matrix used here, accessed via figshare.


Denk T, Grimm GW. (2009) The biogeographic history of beech trees. Review of Palaeobotany and Palynology 158: 83–100.

Denk T, Hill RS, Simeone MC, Cannon C, Dettmann ME, Manos PS. (2019) Comment on “Eocene Fagaceae from Patagonia and Gondwanan legacy in Asian rainforests”. Science 366: eaaz2189.

Grímsson F, Zetter R, Grimm GW, Krarup Pedersen G, Pedersen AK, Denk T. (2015) Fagaceae pollen from the early Cenozoic of West Greenland: revisiting Engler's and Chaney's Arcto-Tertiary hypotheses. Plant Systematics and Evolution 301: 809–832.

Hipp AL, Manos PS, Hahn M, et al. (2019) Genomic landscape of the global oak phylogeny. New Phytologist doi:10.1111/nph.16162.

Oh S-H, Manos PS. (2008) Molecular phylogenetics and cupule evolution in Fagaceae as inferred from nuclear CRABS CLAW sequences. Taxon 57: 434–451.

Renner SS, Grimm GW, Kapli P, Denk T. (2016) Species relationships and divergence times in beeches: New insights from the inclusion of 53 young and old fossils in a birth-death clock model. Philosophical Transactions of the Royal Society B doi:10.1098/rstb.2015.0135.

Wilf P, Nixon KC, Gandolfo MA, Cúneo NR. (2019) Eocene Fagaceae from Patagonia and Gondwanan legacy in Asian rainforests. Science 364:  eaaw5139.

For more literature, see the post:
Ockham's Razor applied but not used: can we make a DNA-scaffolding with seven characters?

No comments:

Post a Comment