Monday, December 10, 2018

Please stop using cladograms!

I really like the journal PeerJ, not only because it is open access and publishes the peer review process, but also because it's one of the few that adhere to strict policies when it comes to data documentation. In my last (on my own) 2-piece post (part 1, part 2), I showed what networks could have offered for historical and more recent studies in Cladistics, the journal of the Willi Hennig Society. In this one, I'll illustrate why paleontology in general needs to stop using cladograms.

An example

In a recent article, Atterholt et al. (PeerJ 6: e5910, 2018) describe and discuss "the most complete enantiornithine from North America and a phylogenetic analysis of the Avisauridae". I'm not a paleozoologist and "stuff of legend", but their first 17 figures seem to make a good point about the beauty of the fossil and its relevance; and it is interesting to read about it. This makes me envy paleozoologists a bit — the reason I exchanged chemistry for paleontology was my childhood love for the thunder lizards; I specialized in zoology not botany for graduate biology courses, and I fell in love with social insects, especially bees; but then more general circumstances pushed me into plant phylogenetics.

The result of Atterholt et al.'s phylogenetic analysis is presented in their figure 18, as shown here.

Figure 18 of Atterholt et al. (2018): "A cladogram depicting the hypothetical phylogenetic position of Mirarce eatoni." [the beautiful fossil is highlighted in bold font]
This looks very familiar — graphs like this can be seen in many paleontological studies, not only those in Cladistics. However, this is a phylogeneticist's "nightmare" (but a cladist's "dream").

First, phylogenetic trees, especially those that were weighted post-analysis several times to get a more or less resolved tree, should be depicted as phylograms — trees with branch lengths. Phylogenetic hypotheses are not only about clades, and what is sister to what, but about the amount of (inferred) evolutionary change between the hypothetical ancestors, the internal nodes, and their descendants, the labelled tips. For example, we may want to know how long is the root of the clade (Avisauridae, Avisaurus s.l.) comprising the focus taxon compared to the lengths of the terminal branches within the clade. Prominent roots and short terminals are a good sign for monophyly (inclusive common origin), or at least a fossil well placed, whereas short roots and long terminals are not.

The above tree as phylogram (using PAUP*'s AccTran optimization). The beauty of cladistic classification is that the new specimen could have just been described as another species of Avisaurus (but read the author's discussion).

In this example, we seem to be on the safe side, although one may question the general taxonomic concept for extinct birds. Are the differences enough to erect a new genus for every specimen? This is hard to decide based on this matrix.

Second, a tree without branch support is just a naked line graph, telling us nothing about the quality (strengths and weaknesses) of the backing data. Neontologists are not allowed to publish naked trees. In molecular phylogenetics, we are not uncommonly asked by reviewers to drop all branches (internodes) below an arbitrary threshold: a bootstrap (BS) support value < 70 and posterior probability (PP) < 0.95. In palaentology, it has become widely accepted to not show support values at all. The reason is simple: the branch support is always low, because of data gaps and homoplasy. This is a problem the authors are well aware of:
The modified matrix consists of 43 taxa (26 enantiornithines, 10 ornithuromorphs) scored across 252 morphological characters [the provided matrix lists 253], which we analyzed using TNT (Goloboff, Farris & Nixon, 2008a). Early avian evolution is extremely homoplastic (O’Connor, Chiappe & Bell, 2011; Xu, 2018) thus we utilized implied weighting (without implied weights Pygostylia was resolved as a polytomy due to the placement of Mystiornis) (Goloboff et al., 2008b); we explored k values from one to 25 (see Supplemental Information) and found that the tree stabilized at k values higher than 12. In the presented analysis we conducted a heuristic search using tree-bisection reconnection retaining the single shortest tree from every 1,000 replications with a k-value of 13. This produced six most parsimonious trees with a score of 25.1. These trees differed only in the relative placement of five enantiornithines closely related to the Avisauridae, forming a polytomy with this clade in the strict consensus tree (Consistency Index = 0.453; Retention Index = 0.650; Fig. 18).
I've seen much worse CI and RI values in the paleophylogenetic literature (some of them are plotted in this post). For a phylogenetic inference, homoplasy equals internally incompatible signals — many characters show different, partly or fully conflicting, taxon bipartitions; or, in other words, they prefer different trees. The signal in the matrix is thus not tree-like — it doesn't fit a single tree. That's why we have to choose one using TNT's iterated reweighting procedures. (Note: an alternative "phenetic" Neighbor-joining tree has a computation time < 1s, and produces the same tree for the Ornithumorpha and the root-proximal, 'basal' part of the tree, except that Jeholornis is moved two nodes up; but it shuffles a lot in the Longirostravis–Avisauridae clade.)

Another point is that the more homoplasy we have, then the higher must have been the rate of change (here: visible anatomical mutation). The higher the rate of change, the higher the statistical inconsistency of parsimony.

In short, paleontologists (Atterholt et al. just follow the standard in paleophylogenetic publications) use data with tree-unlike signal to infer trees (see also David's last post on illogicality in phylogenetics) under a possibly invalid optimality criterion, which are then used to downweight characters (eliminate noise due to homoplasy) to infer less noisy, "better" trees.

The basic signal

We can't change the data, but we can explore and show its signal. And the basic signal from the unfiltered matrix is best visualized using a Neighbor-net splits graph.

Neighbor-net based on mean pairwise taxon distances. Thick edges correspond to branches in the published tree.

Some differentiation patterns that explain the clades in the tree can be traced, but it becomes difficult in the group that is of most interest: the (inferred) clade(s) comprising the newly described fossil. In the Neighbor-net this is placed close to another member of the Avisauridae, but not all. The matrix is not optimal for the task at hand.

The data properties

The matrix is a multistate matrix with up to six states in the definition line (although only five are used, as state "5" is not present). The taxa have variable gappyness (i.e. the proportion of completely undetermined cells), between 2% (extant birds: Anas and Gallus) and 94% (Intiornis, an Avisauridae) — the median is 56%, and the average close to it (54%). The "hypothetically" placed fossil Mirarce eatoni (in the matrix it is under its old designation: "Kaiparowits") lacks a bit more of the scored characters (61%). That may strike one as a lot, but note that the matrix has 253 characters! However, we may well ask: if I want to place a fossil for which I can score 99 characters, why bother to include another ~150 that tell me nothing about its affinity? (Note: paleobotanists struggle hard even to get such numbers, we usually have at best 50 characters.)

Its closest putative relatives, the Avisaurus s.l., lack 90% of the characters; leaving us with max. 25 characters supporting the relevant clade (assuming that the 10% are all found in Mirarce as well). Coverage is not much better in the next-closest relatives (phylogenetically speaking).

Data coverage in the phylogenetic neighborhood of Mirarce eatoni

The missing data percentage may have mislead the Neighbor-net a bit, because we will have fed it with unrepresentative or highly ambiguous pairwise distances. In the the network, the focus fossil comes close to Neuquenornis, the only other Avisauridae with some data coverage. Looking at the heat map below, we see that missing data is indeed a problem in this matrix — we have zero distances between several pairs that show different distances to the better-covered taxa.

The distance matrix drawn as a heat map: green = similar, red = dissimilar (values range between 0 and 0.8). Red arrows: taxa with too many (and ambiguous) zero pairwise distances.

The closest relative of Mirarce is, indeed, Avisaurus/Gettya gloriae, but the latter has zero distances to various other poorly covered taxa from the phylogenetic neighborhood, in contrast to the much better-covered Mirarce. Neighbor-nets are very good at getting the obvious out of a morphological matrix, but they don't perform miracles. However, why should we include poorly known taxa at all during phylogenetic inference? Wouldn't it be better to infer a backbone tree (or network showing the alternative hypotheses) based on a less gappy matrix, and then find the optimal position of the poorly known taxa within that tree (network)?

Estimating the actual character support

Some characters cover just 10–20% of the taxa, whereas others are scored for most of them — more than half of the characters are missing for more than half of the taxa. Using TNT's iterative weight-to-fit option means that we infer a tree, ideally one fitting the well-covered data (taxon- and character-wise), and then downweight all conflicting characters elsewhere to fit this tree. We then end up with a tree where we have no idea about actual character support. Since the matrix is a Swiss cheese, we only can re-affirm the first-inferred tree.

Let's check the raw character support, using non-parametric bootstrapping and maximum likelihood as the optimality criterion (corrected for ascertainment bias, as implemented in RAxML).

ML-BS Consensus Network (using Lewis' 2-parameter Mk+G model). Edge lengths are proportional to the BS support values of taxon bipartitions (= phylogenetic splits, internodes, branches in phylogenetic trees). Only splits are shown that occurred in at least 10% of 900 BS pseudoreplicates (number of necessary BS replicates determined by the Extended Majority Rule Bootstrap criterion), trivial splits collapsed. Thick edges correspond with branches in Atterholt et al.'s iterative parsimony tree; coloring as before.

The ML bootstrap Consensus Network bears not a few similarities to the distance-based Neighbor-net. The characters do not support the Avisauridae subtree, as depicted in the published TNT tree, but there are faint signals associating some of them to each other, despite the missing data. Keep in mind: a BS support of 20 for one alternative and < 10 for all others means (ideally) one fifth of the characters support the split, and the rest have no (coherent) information. Some sister pairs have quite high support (for this kind of data set), and Gettya gloriae is resolved as sister of Mirarce (unambiguously, with a BS support = 67). But, the matrix hardly has the capacity to resolve deeper relationships within the group of interest, the Enantiornithes — the polytomy with the next relatives seen in the tree and the corresponding clade dissolve. This confirms what we saw in the Neighbor-Net (despite missing data distortion).

The matrix and the tree show something that could have been deduced directly from the distance matrix: the poorly known Gettya (Avisaurus) gloriae is (literally) the closest relative of the enigmatic new genus / species Mirarce (morphological distance of 0.08 compared to 0.1–0.64 for all other taxa). But is this overall similarity enough to conclude Avisaurus, Gettya and Mirarce are a monophyletic group within the Avisauridae?

What the authors (and all paleontologists doing phylogenetics) should have done

(I would have skipped all trees, naturally, but peer reviewers and most readers probably need to see them.)

  • Trimmed the matrix to include only those characters preserved in the fossil of interest, in order to minimize missing data artefacts during inference.
  • Shown the Neighbor-net to visualize the primary signal situation, including and excluding poorly covered taxa. From the Neighbor-net it is already obvious that the fossil is an Enantiornithes, so any subsequent optimization / inference could have focussed on this group alone.
  • Then inferred a backbone tree excluding poorly covered taxa, and shown the resulting phylogram. In case one needs to test the Enantiornithes root (the Neighbor-net gives us two alternatives for the Enantiornithes root: Pengornis + Eopengornis or Protopteryx + Iberomesornis), there is no point in including the poorly covered Enantiornithes or the worst-covered taxa outside this clade.
  • Then optimized the position of the poorly covered taxa in the backbone tree. I recommend using RAxML's evolutionary placement algorithm (EPA) for this, but you can also do this in a parsimony framework if you wish. (EPA can also be used to test outgroup roots: here, one would search the branch at which all non-Enantiornithes fit best.)
  • Shown the resulting phylogram including all taxa — that is, read in the topology to the analysis, and then re-optimize branch lengths.
  • Shown a Support Consensus Network to illustrate the support for the branches in the preferred tree and their competing alternatives. (There may be one or more, as there are many options to estimate branch support.) How sure can we be about relationships within the Avisauridae and their relationships to other Enantiornithes?

Postscriptum. For those who are curious about how the ML tree would look like, here it is:

I have no idea about birds, but from a methodological point of view this is an equally (if not more, because unforced) valid hypothesis for the data set. And demonstrating its limitations: note the relatively long branches with very low support making up the backbone of the Enantiornithes clade. This is typical for matrices lacking coherent discriminatory signal and/or struggling with internal conflict.