Monday, June 11, 2018

Want to place a fossil in a minute? Just use Neighbour-nets


Palaeontological phylogenetic researchers typically put a lot of effort into inferring trees. It has been argued (and occasionally pointed out during manuscript reviews) that only by placing a fossil in an explicitly phylogenetic framework can we assess what it represents. I sympathize with this notion, but in most cases we don't need any elaborate analysis to do it — a quick network-based analysis will do the trick.

In this post, I'll demonstrate my point using the most recent matrix presented by two eminent plant morphology veterans. In a fresh-off-the-press paper, James Doyle & Peter Endress provide a "Phylogenetic analysis of Cretaceous fossils related to Chloranthaceae and their evolutionary implications" (Botanical Review) using their morphology matrix focussing on early diverging angiosperm lineages, which was originally used for a paper by Sareela et al. (Nature 446: 312–315, 2009) and has been continually updated.

Like all morphological matrices that aim to cover as much as possible, Doyle & Endress' matrix does not provide any strong tree-like signal, and hence it has little use for inferring phylogenetic trees. Doyle & Endress deal with this issue by using a (more or less molucular-based) backbone tree enforcing several clades for the modern taxa, and then trying to find the most parsimonious placement of the fossil(s). This approach works to some degree but has two problems: one theoretical and one practical.

First, the backbone tree, or any molecular-informed topology, is usually some steps longer than the most-parsimonious trees that could be inferred on their matrix. In other words, morphological evolution in plants doesn't fully fulfill Ockham's Razor. Why should this also be the case for the fossils?

Second, moving a fossil through the branches to find the best-placement takes some time, and will lead to many equally parsimonious solutions. Not rarely, the fossil can be placed on quite distant branches, producing trees that are only a few steps longer.

A graph I made depicting the 'parsimoniousness' of placing a fossil, Monetianthus (a Cretaceous water lily), within a given topology using an earlier version of Doyle & Endress' matrix (fig. 7 in Friis et al., Int. J. Plant Sci., 2009). The number of additional steps was estimated by moving the target taxon, the fossil, to the accordingly coloured branch of the tree. (PS To show that the fossil is a water lily, a Nympheaceae, we used a Neighbour-net)

For Doyle & Endress' papers this is no big problem, because they just show the best placements as well as those a few steps longer. For example:
Placing the Chloranthistemon species on the stem lineage of Sarcandra and Chloranthus is four steps less parsimonious than placing them on the stem of Chloranthus. For perspective, only two steps are added if the Asteropollis plant is moved to the stem of the whole family. If a four-step parsimony debt is accepted in moving Chloranthistemon to a morphologically less favored position, one may ask why the Asteropollis plant is considered a reliable minimum age constraint for the family.
But with respect to the fact that morphological evolution is not necessarily parsimonious, and that even the modern taxa can show variable root to tip pathlength distances, I always remain skeptical of this approach.


A Bayesian-inferred angiosperm tree based on a total-evidence matrix, built from a curated version of Soltis et al.'s 2011 matrix and including the 2010-version of Doyle & Endress' matrix as morphological partition (provided as open data @ figshare). Note that many fossils (Cretaceous, ~100 Ma) have longer terminal branches than their surviving relatives (hence, made Bayesian total evidence dating impossible).

Aside from this, the matrix signal is pretty straightforward when it comes to decide on the potential position of the fossil in the angiosperm part of the Tree of Life. And the analysis takes (literally) moments.

You just take the matrix, calculate mean pairwise morphological distances (done in a blink), export the distance matrix as NEXUS-formatted file and input this to SplitsTree, which will give you a Neighbour-net (in another blink).

A Neighbour-net based on Doyle & Endress' 2018 matrix including only the modern-day taxa.

Most members of the well-established clades, main angiosperm lineages, cluster in the Neighbour-net (bracketed names point to somewhat scattered clades). In case the signal from a fossil is trivial, it will be nested within the respective cluster. Trivial signals are when a fossil has a character suite that indicates it is much more similar to one of the clusters than to any other, which usually means that it is part of the same evolutionary lineage. Convergences may be common, and characters homoplasious, but evolving the exact same suite of characters while not sharing common ancestry is quite unlikely.

The Neighbour-net including the matrix' fossils. Note that the relative position of the Eudicots has changed and Circaeaster and Euptelea are placed closer to the other Ranunculales, although the pairwise distances between all modern taxa have not changed. The re-arrangement is solely a fossil-inclusion effect. By adding fossils attracted to Ceratophyllum, this enigmatic and isolated genus is drawn away from the most basal eudicots.

Two of the fossils apprear to have unique character suites, placing them intermediate between two phylogenetically isolated plants, the unique Amborella (still considered the earliest branching modern-day angiosperm; one species on New Caledonia) and Ceratophyllum, an equally enigmatic water plant. The remainder are clearly members of the Chloranthales.


Having identified the phylogenetic Neighbourhood of the fossils, we can then focus on this neighbourhood (in SplitsTree: Select the comprising OTUs; then go to menu Data > Keep only selected taxa).

The Neighbour-net after all non-neighbourhood OTUs have been removed.

From this graph we can directly conclude:
  • The Asteropollis plant is a close relative, likely a sister or early representative, of Hedyosmum.
  • Couperites represents an early and substantially diverged lineage within the Chloranthales — its closest living relative appears to be Ascarina, next to Hedyosmum the most derived living Chlorantales (here: one would need to see if there is any shared character suite).
  • Zlatkocarpus is an ancestral form of the Chloranthales core clade, comprising Ascarina and the sister taxa Chloranthus and Sacandra (one would need to check the possibility that this could be a missing data artefact).
  • Canrightia and Canrightiopsis are sister lineages or precursors of Chloranthus and Sacandra (see the open access tree-and-network-based paper by Friis et al. 2015, Grana 54: 184–212)
  • The Pennipollis plant is an ancient isolated Chloranthales lineage, with no living relative.
  • Appomattoxia and Pseudoasterophyllites may be (very) distant relatives of Ceratophyllum, the latter an isolated genus that has long been and still is a problem for molecular phylogenies. Alternatively, they may represent early and extinct angiosperm lineages (or the same lineage) with no modern counterparts.
To say anything beyond this based on the current data set quickly leaves the grounds of objectivity, and requires a priori assumptions about the importance of certain morphological traits being shared or not (i.e. not expressed, not just missing due to poor preservation).

[For those interested in a formal discussion of these results, see Doyle & Endress, 2018, pp. 7–25.]

To refine the analysis, we can just reduce the character set to the characters scored for the fossils.

A Neighbour-net based on distances computed using a character-subset generated by excluding all invariable characters (in PAUP*: Exclude constant) for the taxa included in the network (66 characters, including eight not defined for any fossil taxon). Grey depicts a (molecular-data backed) tree hypothesis that could explain the seen differentiation pattern

To take the next step, morphological matrices alone will have no practical use, because they will not allow us to identify fast vs. slow evolving traits and lineages. A lineage that goes through bottlenecks or colonizes new niches will be genetically, and morphologically, more distinct than one that remained in calm waters. One could map the preserved traits found in each fossil onto a molecular tree, and include the information about actual branch lengths in that tree to put forwards hypotheses about the ancestral state (e.g. Mendes et al., Grana 53: 283–301 [open access] for Lardizabalaceae), and possibly even time the divergences. This would enable one to compare the situation in the fossils with the top-down hypotheses about morphological evolution for the very same time period.

But that would be mostly tree-based analyses, and thus nothing for a Genealogical World of Phylogenetic Networks post.

Data — In case you are interested in the primary data matrix, a ready-to-use NEXUS version and the raw Split-NEXUS files have been uploaded to figshare.

No comments:

Post a Comment