Monday, December 10, 2018

Please stop using cladograms!


I really like the journal PeerJ, not only because it is open access and publishes the peer review process, but also because it's one of the few that adhere to strict policies when it comes to data documentation. In my last (on my own) 2-piece post (part 1, part 2), I showed what networks could have offered for historical and more recent studies in Cladistics, the journal of the Willi Hennig Society. In this one, I'll illustrate why paleontology in general needs to stop using cladograms.

An example

In a recent article, Atterholt et al. (PeerJ 6: e5910, 2018) describe and discuss "the most complete enantiornithine from North America and a phylogenetic analysis of the Avisauridae". I'm not a paleozoologist and "stuff of legend", but their first 17 figures seem to make a good point about the beauty of the fossil and its relevance; and it is interesting to read about it. This makes me envy paleozoologists a bit — the reason I exchanged chemistry for paleontology was my childhood love for the thunder lizards; I specialized in zoology not botany for graduate biology courses, and I fell in love with social insects, especially bees; but then more general circumstances pushed me into plant phylogenetics.

The result of Atterholt et al.'s phylogenetic analysis is presented in their figure 18, as shown here.

Figure 18 of Atterholt et al. (2018): "A cladogram depicting the hypothetical phylogenetic position of Mirarce eatoni." [the beautiful fossil is highlighted in bold font]
This looks very familiar — graphs like this can be seen in many paleontological studies, not only those in Cladistics. However, this is a phylogeneticist's "nightmare" (but a cladist's "dream").

First, phylogenetic trees, especially those that were weighted post-analysis several times to get a more or less resolved tree, should be depicted as phylograms — trees with branch lengths. Phylogenetic hypotheses are not only about clades, and what is sister to what, but about the amount of (inferred) evolutionary change between the hypothetical ancestors, the internal nodes, and their descendants, the labelled tips. For example, we may want to know how long is the root of the clade (Avisauridae, Avisaurus s.l.) comprising the focus taxon compared to the lengths of the terminal branches within the clade. Prominent roots and short terminals are a good sign for monophyly (inclusive common origin), or at least a fossil well placed, whereas short roots and long terminals are not.

The above tree as phylogram (using PAUP*'s AccTran optimization). The beauty of cladistic classification is that the new specimen could have just been described as another species of Avisaurus (but read the author's discussion).

In this example, we seem to be on the safe side, although one may question the general taxonomic concept for extinct birds. Are the differences enough to erect a new genus for every specimen? This is hard to decide based on this matrix.

Second, a tree without branch support is just a naked line graph, telling us nothing about the quality (strengths and weaknesses) of the backing data. Neontologists are not allowed to publish naked trees. In molecular phylogenetics, we are not uncommonly asked by reviewers to drop all branches (internodes) below an arbitrary threshold: a bootstrap (BS) support value < 70 and posterior probability (PP) < 0.95. In palaentology, it has become widely accepted to not show support values at all. The reason is simple: the branch support is always low, because of data gaps and homoplasy. This is a problem the authors are well aware of:
The modified matrix consists of 43 taxa (26 enantiornithines, 10 ornithuromorphs) scored across 252 morphological characters [the provided matrix lists 253], which we analyzed using TNT (Goloboff, Farris & Nixon, 2008a). Early avian evolution is extremely homoplastic (O’Connor, Chiappe & Bell, 2011; Xu, 2018) thus we utilized implied weighting (without implied weights Pygostylia was resolved as a polytomy due to the placement of Mystiornis) (Goloboff et al., 2008b); we explored k values from one to 25 (see Supplemental Information) and found that the tree stabilized at k values higher than 12. In the presented analysis we conducted a heuristic search using tree-bisection reconnection retaining the single shortest tree from every 1,000 replications with a k-value of 13. This produced six most parsimonious trees with a score of 25.1. These trees differed only in the relative placement of five enantiornithines closely related to the Avisauridae, forming a polytomy with this clade in the strict consensus tree (Consistency Index = 0.453; Retention Index = 0.650; Fig. 18).
I've seen much worse CI and RI values in the paleophylogenetic literature (some of them are plotted in this post). For a phylogenetic inference, homoplasy equals internally incompatible signals — many characters show different, partly or fully conflicting, taxon bipartitions; or, in other words, they prefer different trees. The signal in the matrix is thus not tree-like — it doesn't fit a single tree. That's why we have to choose one using TNT's iterated reweighting procedures. (Note: an alternative "phenetic" Neighbor-joining tree has a computation time < 1s, and produces the same tree for the Ornithumorpha and the root-proximal, 'basal' part of the tree, except that Jeholornis is moved two nodes up; but it shuffles a lot in the Longirostravis–Avisauridae clade.)

Another point is that the more homoplasy we have, then the higher must have been the rate of change (here: visible anatomical mutation). The higher the rate of change, the higher the statistical inconsistency of parsimony.

In short, paleontologists (Atterholt et al. just follow the standard in paleophylogenetic publications) use data with tree-unlike signal to infer trees (see also David's last post on illogicality in phylogenetics) under a possibly invalid optimality criterion, which are then used to downweight characters (eliminate noise due to homoplasy) to infer less noisy, "better" trees.

The basic signal

We can't change the data, but we can explore and show its signal. And the basic signal from the unfiltered matrix is best visualized using a Neighbor-net splits graph.

Neighbor-net based on mean pairwise taxon distances. Thick edges correspond to branches in the published tree.

Some differentiation patterns that explain the clades in the tree can be traced, but it becomes difficult in the group that is of most interest: the (inferred) clade(s) comprising the newly described fossil. In the Neighbor-net this is placed close to another member of the Avisauridae, but not all. The matrix is not optimal for the task at hand.

The data properties

The matrix is a multistate matrix with up to six states in the definition line (although only five are used, as state "5" is not present). The taxa have variable gappyness (i.e. the proportion of completely undetermined cells), between 2% (extant birds: Anas and Gallus) and 94% (Intiornis, an Avisauridae) — the median is 56%, and the average close to it (54%). The "hypothetically" placed fossil Mirarce eatoni (in the matrix it is under its old designation: "Kaiparowits") lacks a bit more of the scored characters (61%). That may strike one as a lot, but note that the matrix has 253 characters! However, we may well ask: if I want to place a fossil for which I can score 99 characters, why bother to include another ~150 that tell me nothing about its affinity? (Note: paleobotanists struggle hard even to get such numbers, we usually have at best 50 characters.)

Its closest putative relatives, the Avisaurus s.l., lack 90% of the characters; leaving us with max. 25 characters supporting the relevant clade (assuming that the 10% are all found in Mirarce as well). Coverage is not much better in the next-closest relatives (phylogenetically speaking).

Data coverage in the phylogenetic neighborhood of Mirarce eatoni

The missing data percentage may have mislead the Neighbor-net a bit, because we will have fed it with unrepresentative or highly ambiguous pairwise distances. In the the network, the focus fossil comes close to Neuquenornis, the only other Avisauridae with some data coverage. Looking at the heat map below, we see that missing data is indeed a problem in this matrix — we have zero distances between several pairs that show different distances to the better-covered taxa.

The distance matrix drawn as a heat map: green = similar, red = dissimilar (values range between 0 and 0.8). Red arrows: taxa with too many (and ambiguous) zero pairwise distances.

The closest relative of Mirarce is, indeed, Avisaurus/Gettya gloriae, but the latter has zero distances to various other poorly covered taxa from the phylogenetic neighborhood, in contrast to the much better-covered Mirarce. Neighbor-nets are very good at getting the obvious out of a morphological matrix, but they don't perform miracles. However, why should we include poorly known taxa at all during phylogenetic inference? Wouldn't it be better to infer a backbone tree (or network showing the alternative hypotheses) based on a less gappy matrix, and then find the optimal position of the poorly known taxa within that tree (network)?

Estimating the actual character support

Some characters cover just 10–20% of the taxa, whereas others are scored for most of them — more than half of the characters are missing for more than half of the taxa. Using TNT's iterative weight-to-fit option means that we infer a tree, ideally one fitting the well-covered data (taxon- and character-wise), and then downweight all conflicting characters elsewhere to fit this tree. We then end up with a tree where we have no idea about actual character support. Since the matrix is a Swiss cheese, we only can re-affirm the first-inferred tree.

Let's check the raw character support, using non-parametric bootstrapping and maximum likelihood as the optimality criterion (corrected for ascertainment bias, as implemented in RAxML).

ML-BS Consensus Network (using Lewis' 2-parameter Mk+G model). Edge lengths are proportional to the BS support values of taxon bipartitions (= phylogenetic splits, internodes, branches in phylogenetic trees). Only splits are shown that occurred in at least 10% of 900 BS pseudoreplicates (number of necessary BS replicates determined by the Extended Majority Rule Bootstrap criterion), trivial splits collapsed. Thick edges correspond with branches in Atterholt et al.'s iterative parsimony tree; coloring as before.

The ML bootstrap Consensus Network bears not a few similarities to the distance-based Neighbor-net. The characters do not support the Avisauridae subtree, as depicted in the published TNT tree, but there are faint signals associating some of them to each other, despite the missing data. Keep in mind: a BS support of 20 for one alternative and < 10 for all others means (ideally) one fifth of the characters support the split, and the rest have no (coherent) information. Some sister pairs have quite high support (for this kind of data set), and Gettya gloriae is resolved as sister of Mirarce (unambiguously, with a BS support = 67). But, the matrix hardly has the capacity to resolve deeper relationships within the group of interest, the Enantiornithes — the polytomy with the next relatives seen in the tree and the corresponding clade dissolve. This confirms what we saw in the Neighbor-Net (despite missing data distortion).

The matrix and the tree show something that could have been deduced directly from the distance matrix: the poorly known Gettya (Avisaurus) gloriae is (literally) the closest relative of the enigmatic new genus / species Mirarce (morphological distance of 0.08 compared to 0.1–0.64 for all other taxa). But is this overall similarity enough to conclude Avisaurus, Gettya and Mirarce are a monophyletic group within the Avisauridae?

What the authors (and all paleontologists doing phylogenetics) should have done

(I would have skipped all trees, naturally, but peer reviewers and most readers probably need to see them.)

  • Trimmed the matrix to include only those characters preserved in the fossil of interest, in order to minimize missing data artefacts during inference.
  • Shown the Neighbor-net to visualize the primary signal situation, including and excluding poorly covered taxa. From the Neighbor-net it is already obvious that the fossil is an Enantiornithes, so any subsequent optimization / inference could have focussed on this group alone.
  • Then inferred a backbone tree excluding poorly covered taxa, and shown the resulting phylogram. In case one needs to test the Enantiornithes root (the Neighbor-net gives us two alternatives for the Enantiornithes root: Pengornis + Eopengornis or Protopteryx + Iberomesornis), there is no point in including the poorly covered Enantiornithes or the worst-covered taxa outside this clade.
  • Then optimized the position of the poorly covered taxa in the backbone tree. I recommend using RAxML's evolutionary placement algorithm (EPA) for this, but you can also do this in a parsimony framework if you wish. (EPA can also be used to test outgroup roots: here, one would search the branch at which all non-Enantiornithes fit best.)
  • Shown the resulting phylogram including all taxa — that is, read in the topology to the analysis, and then re-optimize branch lengths.
  • Shown a Support Consensus Network to illustrate the support for the branches in the preferred tree and their competing alternatives. (There may be one or more, as there are many options to estimate branch support.) How sure can we be about relationships within the Avisauridae and their relationships to other Enantiornithes?



Postscriptum. For those who are curious about how the ML tree would look like, here it is:


I have no idea about birds, but from a methodological point of view this is an equally (if not more, because unforced) valid hypothesis for the data set. And demonstrating its limitations: note the relatively long branches with very low support making up the backbone of the Enantiornithes clade. This is typical for matrices lacking coherent discriminatory signal and/or struggling with internal conflict.

7 comments:

  1. Part 1
    (Limit of 4096 characters.)

    First, phylogenetic trees, especially those that were weighted post-analysis several times to get a more or less resolved tree, should be depicted as phylograms — trees with branch lengths. Phylogenetic hypotheses are not only about clades, and what is sister to what, but about the amount of (inferred) evolutionary change between the hypothetical ancestors, the internal nodes, and their descendants, the labelled tips.

    For me and many others, the foremost question is in fact what is sister to what.

    Branch lengths depend really heavily on the character sample, and terminal branches are automatically too short in a parsimony analysis because nobody includes all their parsimony-informative autapomorphies in a matrix. These two reasons seem to be why branch lengths are practically never given for MPTs, unless of course the tree is mapped on a timescale and the branch lengths represent time rather than amount of change.

    In this example, we seem to be on the safe side, although one may question the general taxonomic concept for extinct birds. Are the differences enough to erect a new genus for every specimen? This is hard to decide based on this matrix.

    Uh, this is impossible to decide based on anything. It's not like "genus" has a definition, or if there were any kind of consistency in actual usage.

    In palaentology, it has become widely accepted to not show support values at all.

    Bootstrap analyses are still pretty common, but they're extremely time-consuming (at least for realistic matrices with plenty of missing data and lots of homoplasy).

    However, the authors' decision to use implied weighting because there's so much homoplasy strikes me as potentially problematic. Implied weighting assumes that homoplasy is distributed on an exponential curve so that homoplasy-free characters are more common than characters with any other particular amount of homoplasy – and in many matrices that is just not the case. I've written more about this in the discussion section of the last preprint of my paper with Michel Laurin that is currently in production at PeerJ.

    I've seen much worse CI and RI values in the paleophylogenetic literature

    They're so high because of the implied weighting. Without that, the indices are much worse for matrices of this size (unless the authors have cherry-picked their data to an extent that doesn't seem to have happened since the late 1990s).

    The signal in the matrix is thus not tree-like — it doesn't fit a single tree.

    But we have no reason to think that introgression, incomplete lineage sorting or any such thing happened in this part of the tree to a noticeable extent. Thus, it is reasonable to assume a tree with lots of homoplasy, which is what we get.

    Another point is that the more homoplasy we have, then the higher must have been the rate of change (here: visible anatomical mutation). The higher the rate of change, the higher the statistical inconsistency of parsimony.

    That does not matter unless we get to long-branch attraction (which exists in morphological phylogenetics, but is not common) or have redundant characters in the matrix (on which see my preprint).

    ReplyDelete
    Replies
    1. Dear David,
      first thanks for the long comment, you raise some very good points.

      Re: Branch-lengths, I agree that the omission of autapomorphies is an important issue. But keeping in mind that morphodata are (usually) cleaned for unique features, the long root-short tip/short root-long tips pattern works: Long tips are then directly due to (stochastically distributed) homoplasy. Thus, indicating that a substantial amount of the inferred change was incompatible with the tree.

      Re: Bootstrap analysis, used to be slow, it's now very fast. NJ-BS is per se very fast, and the ML-BS+tree analysis for this set took less than half an hour. Haven't uses TNT for some time, but I think it also includes quick-BS options. With PAUP*, you just set the number high (e.g. 10000) and deactivate MulTrees (Müller KF. 2005. The efficiency of different search strategies for estimating parsimony, jackknife, bootstrap, and Bremer support. BMC Evolutionary Biology 5:58), and you get pretty fitting (regarding the actual data situation) parsimony BS values.

      Re: Weighting and homoplasy. I agree.

      Re: Tree or not. The whole point about exploratory data analysis is (and why I consider it the most crucial thing to do): even if there was no reticulation during the evolution of a lineage (I would not disgard it totally here, why should the first birds be more picky than contemporary ones or other reptiles?), the signal in the data may be not tree-like. Any ancestral form, any poorly understood form, any misinterpretation, any preferable (positively selected) trait, will inflict topological ambiguity masking the true tree.
      This is not a problem restricted to morphology, why phylogenomics (the issues are a bit different, but the problem is similar) go away from inferring a single tree to inferring a coalescent.
      With morphodata we can't do this. And retreat on things like implied weighting to do Cinderella's trick: separate the good from the bad ones.

      Delete
  2. Part 2
    Still running up against the limit.

    However, we may well ask: if I want to place a fossil for which I can score 99 characters, why bother to include another ~150 that tell me nothing about its affinity?

    No. No. No! We must include as many parsimony-informative, independent characters as possible in any parsimony analysis. That is because we're inferring the whole tree at once. We do not place one new taxon into an already existing tree.

    Every terminal taxon, including the new one, has an influence on the position of every other taxon in the tree, and both quality and quantity of this influence are unpredictable from just looking at the matrix and the character list. A few references with examples are in my preprint, which offers further examples.

    After all, there are no downsides! Missing data is nothing to be afraid of in parsimony analyses.*

    * Model-based methods, on the other hand, can do really strange things when confronted with missing data, like finding strong support for completely spurious clades composed of terminal taxa that have no scored data in common at all. (References in my preprint.) And distance methods, as you point out, can't really deal with missing data at all.

    However, why should we include poorly known taxa at all during phylogenetic inference?

    Because they, too, influence the topology of the result.

    Since the matrix is a Swiss cheese, we only can re-affirm the first-inferred tree.

    Well, one of them: implied weighting tries to find out which of several equally most parsimonious trees* is best supported by the least homoplastic characters. But this logically circular method is only known to work when its assumptions are met, and often they're not.

    * Because morphological matrices are smaller than molecular ones, it is highly unusual to find just one.

    But, the matrix hardly has the capacity to resolve deeper relationships within the group of interest, the Enantiornithes — the polytomy with the next relatives seen in the tree and the corresponding clade dissolve. This confirms what we saw in the Neighbor-Net (despite missing data distortion).

    (It also confirms what we see in the unweighted parsimony analyses of all predecessors of this matrix, which is, like, the 20th successive version of the matrix that lab has been using. Perhaps there still aren't enough characters in it; for other potential problems that are common in morphological matrices, see my preprint.)

    ReplyDelete
    Replies
    1. Re: Characters to include. I know you love parsimony, but saving it, is not my game. If I want to place something, I need discriminating data. If the data, I have, doesn't do the trick, I can't place.
      By adding more and more missing data taxa, poorly covered characters, you don't get a "better" (or shall we say "truer" tree), you just water a potential discriminating signal and invite branching artefacts because of what you pointed out: "Every terminal taxon, including the new one, has an influence on the position of every other taxon in the tree, and both quality and quantity of this influence are unpredictable from just looking at the matrix and the character list".
      At least this is my experience with plant morphological data.
      A related problem are outgroup-ingroup branching artefacts. The more outgroups I add that lack signal discriminating in the ingroup, the higher the risk of getting a wrong root (luckily, plesiomorphies prevent this often in case of morphological data, but it's not uncommon for molecular data, where we usually don't have primitive taxa in our matrix, only derived ones.)

      Delete
  3. Part 3

    Gettya (Avisaurus) gloriae

    No, Avisaurus is not a subgenus of Gettya.

    Trimmed the matrix to include only those characters preserved in the fossil of interest, in order to minimize missing data artefacts during inference.

    No. There are no missing-data artefacts in parsimony.

    Also, the fossil of interest is not the only topic of the analysis! The content and relationships of Avisauridae, for instance, are another.

    Besides, where would you draw the line? Completeness is a continuum.

    From the Neighbor-net it is already obvious that the fossil is an Enantiornithes,

    That's already obvious from just looking at it. :-) That bit was not the point of the analysis!

    so any subsequent optimization / inference could have focussed on this group alone.

    No. First, again, every taxon has an unpredictable influence on the position of every other taxon in a phylogenetic analysis. Second, outgroup sampling in particular can have a great influence. For clades whose internal phylogeny is as unclear as Enantiornithes, it is important to sample the outgroup* densely so the last common ancestor of the ingroup* will be reconstructed correctly. In such cases, a focussed analysis is not possible. This is discussed at length in my preprint on the example of Temnospondyli.

    * Defined relative to the authors' interest, not to the actual analysis, for which most of the "outgroup" is of course part of the ingroup.

    How sure can we be about relationships within the Avisauridae and their relationships to other Enantiornithes?

    Not much; that's been clear for a long time, and the present analysis is only an incremental improvement. We're going to need yet more characters and yet more new fossils – on which that same lab is working as we speak. And while I haven't studied this matrix, problems like typos or redundant characters are very common in morphological matrices and have considerable impact.

    ReplyDelete
    Replies
    1. Also run into the word limit problem. So here's the short version.

      Re: "(Avisaurus)" In brackets is just the old generic epithet (in the matrix, the taxon is labelled as "A. ...").

      Re:. Parsimony has the same issues with missing data as all other inference methods. It may not affect the optimised topology so much, but it limits the decision capacity and increases the calculation time, without providing anything useful to the goal here: which is placing a single fossil.

      Re: Unpredictable influence and taxon sampling. A taxon that has an unpredictable, distorting influence on an already well-established tree is much less an issue for my workflow than for traditional tree-inference. The networks I showed can directly depict rogues, i.e. taxa jumping in the tree. They also show, which taxa inflict what topological alternatives, and how incongruent those alternatives are.

      If removing and adding a single taxon changes massively your tree, then your tree (and matrix) is useless for phylogenetics. At least, very vague.

      Re: Outgroup sampling. Same here. Adding a taxon should not change ingroup relationships. What I miss is a test against outgroup-ingroup branching artefacts. We know from molecular data that comprehensive outgroups sampling can save for outgroup-ingroup long branch attraction, but in morphology the basic situation is somewhat different (as you discuss in your preprint). Also, as I show in this post, even molecular data may not escape the trap, not even with a most-comprehensive outgroup.

      The test is simple and uses the same logic as above. If your tree reflects the true tree, removing or adding a taxon should not change anything else. Any outgroup that changes the ingroup topology is to be treated with caution. Otherwise you run into the same problem than with implied weighting: the outgroups you add propagate information to constrain the ingroup topology, possibly outcompeting conflicting ingroup-differentiation signal. When you're outgroups are thoroughly primitive, this will be no problem because you draw the primitive taxa towards the root (where they should be, altough you may get monophyletic grades and paraphyletic clades), but how can you be sure?

      And, if you have all that knowledge about trait evolution over time, why bothering inferring trees at all? You can just map these evolutionary pieces on time and draw the evolutionary tree.

      Delete
    2. Here's a test I propose you do on the bird matrix (or any other matrix you fancy) regarding taxon sampling instability.

      Remove each taxon, re-optimise the tree (with the method of your preference), and compare the topologies e.g. using RAxML's in-built SH test or RF distances if you prefer, or sum them up using the super-network approach in SplitsTree (You can't do a consensus network, because it needs trees with exactly the same amount of tips). I haven't done this for the birds, but my guess is you may get quite a range for the RF scores because of the ambiguity in the focus clade, but when you compare those trees via SH tests, many will not be significantly worse than the preferred tree based on the complete taxon set.

      Now do the same with the Neighbour-net, easy (much easier and quicker) to do in SplitsTree: click on a taxon, go to menu Data > Exlude selected taxon. You won't see any change in the better structured (distance-wise) parts when removing one taxon here, but the fossil to be placed jumps around a bit in its poorly resolved Enantiornithes neighbourhood, especially, when removing the ones with indiscriminate distances (which are the same that trigger the polytomy in the author's preferred tree). Keeping these taxa doesn't give you vital information, they just mess around. However, you will directly see what the matrix provides as potential placement of the fossil given all odds.

      My message here: If you notice substantial instability in your tree(s) because of adding or removing taxa, networks will tell you why this is the case, and allow quick path to elaborate.

      Your parsimony-tree-based approach (adding more data and taxa to get better trees) just invites more problems, and while it may solve some existing ones, it will also enforce others. There's a lesson to learn here from phylogenomics: more data can be more biased (and wrong) than fewer, better understood and better to be understood data. Long known (and often ignored) in multigene analyses.
      Delsuc F, Brinkmann H, Philippe H. 2005. Phylogenomics and the reconstruction of the tree of live. Nature Reviews Genetics 6:361-375. [PDF]

      Delete