In several earlier posts (a non-comprehensive link list can be found at the end of the post), I outlined how networks, tree-sample (Consensus networks, SuperNetworks) or distance-based (Neighbor-nets) may be of practical help, especially when we study phylogenetic relationships of extinct organisms.
In this post, I will further explore this by looking at a matrix for Aetosauria (Parker 2016, PeerJ) that provides an overall (relatively) strong and unambiguous signal. [NB: The reason, I prefer to use PeerJ papers as examples is that it is one of the very few journals that is open access and has a strict open data policy — to publish there, authors have to give access to the used data.]
In the abstract of the original paper, we read the following:
Nonetheless, aetosaur phylogenetic relationships are still poorly understood, owing to an overreliance on osteoderm characters, which are often poorly constructed and suspected to be highly homoplastic. A new phylogenetic analysis of the Aetosauria, comprising 27 taxa and 83 characters, includes more than 40 new characters that focus on better sampling the cranial and endoskeletal regions, and represents the most comprehensive phylogeny of the clade to date. Parsimony analysis recovered three most parsimonious trees; the strict consensus of these trees finds an Aetosauria that is divided into two main clades: Desmatosuchia, which includes the Desmatosuchinae and the Stagonolepidinae, and Aetosaurinae, which includes the Typothoracinae.Parker's (2016) fig. 6 shows the results of the "initial analysis" (click to enlarge, colored annotations added by me).
|Systematic groups based on clades are abbreviated (see next graph for full names).|
A is a "Strict component consensus" of the 30 inferred MPTs (most parsimonious trees), B the Adams consensus. C the Majority rule consensus, branch labels give percentages for branches not found in all MPTs. D a "Maximum agreement subtree after a priori pruning of one taxon (black star) within the upper clade.
Parker's (2016) fig. 7 then shows the preferred result: a "reduced strict consensus of 3 MPTs" with the red star taxon removed, and (rarely seen in dinosaur phylogeny papers) branch-support — including Bootstrap support values below 70, which are very rarely reported in the literature (from my own experience it seems that editors of systematic biology journals don't like them).
Removal of one rogue taxon (called a "wildcard" in paleozoology), Aetobarbakinoides brasiliensis, substantially reduced the number of MPTs. Nonetheless, many branches have low support, and hence also the clades (used here as synonym for monophyla) derived from them – Parker uses branch-based ("stem"-based, brackets on his tree), and node-based taxa (dots).
Low branch support may or may not matter
There are two possible reasons for low branch-support:
- non-discriminatory signal: any alternative branching pattern receives diminishing support
- internal signal conflict: two (or more) alternatives receive similar support.
|Support consensus network based on 10,000 parsimony BS pseudoreplicates. Trivial splits collapsed, only splits are shown the occured in at least 20% of the BS replicates.|
The decreased/low BS support within the most terminal (root-distant) subtrees, the Des'ini and Par'ini, relates to conflicting alternatives involving one or two OTUs. In the case of Des'ini, it is the affinity of Lucasuchus and NCSM 21723, while in the case of Par'ini an alternative (recognizing Tecovasuchus as sister to the remainder) is found in 1 out of three BS pseudoreplicate trees. The diminishing support for basal relationships (root-proximal branches/edges) is due to the general lack of discriminatory signal (BS any alternative < 25). However, there are very few situations in which the best-supported alternative differs much from that in the preferred tree. For instance, any alternative to a Stag'inae sister relationship has even less than BS = 24 (BS = 27 in Parker's "reduced" tree).
Our rogue, however, is not really a 'wildcard'. The scored characters simply put it much closer to the outgroup than is any other ingroup taxon. A simple explanation could be that it is a most primitive (least derived) member of the Aetosauria. Another possibility is that it lacks any critical trait needed to place it within the ingroup. Since the deep splits within the Aetosauria rely on very few character changes, we can put it in different position down here and the tree will still have the same number of inferred changes.
Trivial and non-trivial taxa
The cladograms typically shown provide limited information about the signal in the underlying matrix, its strength and weaknesses, even when not "naked" but annotated using branch-support values. Given that there are no severe overlap gaps in the data, a very quick alternative is the Neighbor-net (a necessary addition, in my opinion).
|Bold edges correspond to branches (hence: clades) in Parker's preferred tree.|
Using this, we can directly depict which groups, potential clades, draw substantial (partly trivial) character support.
For instance, according to Parker's tree and following cladistic classification, Stagonolepis is an invalid taxon: one species (St. robertsoni) is part of the Stag'inae clade, the other (St. olenki) is of the Des'inae clade. Character support is, however, nearly non-existent (Bremer value = 1 and BS = 7 in the original analysis; BS ≤ 20 for any competing alternative in our re-analysis). The distance network shows us why — indeed, both species are closest to each other; but, while St. robertsoni shares a critical Stag'inae character suite and, consequently, shows the highest similarity to Polesinuchus, St. olenki does not share this (note the lack of a corresponding neighborhood). Furthermore, any alternative placement fits even less. Parker's tree only resolved it at sister to all other Des'inae because it didn't fit into any of the well-supported, terminal clades (prominent edge-bundles).
We can also see where we may have to deal with internal signal conflict, and how this may affect the tree inference and lead to ambiguous branch support. Take, for instance, the NCSM 21723 individual (= Gorgetosuchus pekinensis). It's clearly a Des'inae. The reason, we have ambiguous branch support for this staircase-like subtree is that NCSM 21723 is substantially more similar to the distant, equally evolved sister lineage, the Par'ini (purple edge bundle). Hence, it must be placed as sister to all other Des'inae, although it appears to represent a more derived form than Longosuchus, representing the next step towards the most-derived crown-taxon Desmatosuchus. Tecovachus is the source of topological conflict within the Par'ini because it is the least-derived taxon. Its primitiveness will be expressed by placing it as sister to all other Par'ini, while few shared, non-exclusive apomorphies are behind its position in the preferred tree (Bremer value = 1, BS = 48 in Parker's fig. 7).
While it is obvious that the matrix has no clear tree-like signal for resolving any OTU that is not part of the terminal Des'ini and Typ'inae lineages, our 'wildcard' (Aetobarkinoides) is particularly close to the outgroup while showing no affinity to anything else. If it is part of the ingroup, it represents the ancestral form, ie. shows a character suite that is primitive (derived traits may be missing because they are simply not preserved: see description of the taxon in Parker 2016). This is the reason why it acted rogue-ish in tree inferences even though it's favored phylogenetic position is clear.
Parker's original matrix can be found in the supplement to the paper. An annotated ready-to-use NEXUS-formatted version (including my standard codelines for parsimony and distance bootstrapping) and the inference results used here can be found in this figshare submission, which I generated for a technical Q&A.
Here is the promised list of previous posts dealing with fossils and networks.
- Should we try to infer trees on tree-unlikely matrices? July 2017; the signal phylogenetic matrices of major groups of extinct and extant seed plants.
- More non-treelike data forced into trees: a glimpse into the dinosaurs, Aug. 2017; why also paleozoologists should start with network-based EDA — exploratory data analysis.
- Networks, not trees, identify "weak spots" in phylogenetic trees, Oct. 2017; how Consensus networks can be used to visualize topological conflict among MPTs.
- Summarizing non-trivial Bayes tree samples for dating? Just use support consensus networks, Jan. 2018; Bayesian Consensus networks based on mixed data matrices.
- The curious case(s) of tree-like matrices with no synapomorphies, joint post with David, Apr. 2019; looking at CI, RI values and treelikeness.
- Networks for matrices used in Cladistics studies, part 1 (historical matrices), part 2 (recent matrices), Nov. 2018; a collection of networks inferred from matrices used to infer parsimony trees.
- Phylogenetic ambiguity: data gaps, indifference and internal conflict, Jan. 2019; an example (squids) for why consensus networks should be obligatory when facing ambiguous branch support.
- Why the emperor has no clothes on – a thicket of trees, Nov. 2019; gene tree incongruence in plant plastomes and why it probably has little to do with decoupled gene histories.
- Large morphomatrices – trivial signal, Feb. 2020; about the principal signal in a bird-dinosaur supermatrix.
- Supernetworks and gene tree incongruence, May 2020; about mtDNA and splits in early land plants
- Fossils and Networks 2: Deleting (and adding) a tip, Aug. 2020; studying the effect of removing a single taxon from the tree inference using the best-sampled taxa of the bird-dinosaur supermatrix.