Monday, October 26, 2020

Just try it for your data – a last first-of-its-kind Neighbor-net using FTIR data


This is likely to be my last post for this blog.

Some thoughts

When I joined the Genealogical World of Phylogenetic Networks three years ago, I didn't know how much fun it is to blog about science. Blogging, or writing essays, has several advantages against the traditional way to get a researcher's ideas out into the world — writing a scientific paper. The most important one is, one can just try out something without having to worry how this would get past the peer-reviewers and editors (or as I like to call them: the Mighty Beasts lurking in the Forest of Reviews). When I was still a (sort-of) career scientist (ie. paid by tax-payers to do science), I had my share of discouraging experiences, whenever we tried to leave the beaten (and worn out) paths to try something new; to look into the dark places and not right under the street-lights.


Before we submitted papers, we hence put a considerable effort into them, pondering what our peers may criticize, or what might alienate them (being likely unfamiliar with our methodological and philosophical approaches), and thus to minimize the chance all our work would be for nothing. In a couple of cases, where we expected fierce resistance, we opted for low-impact journals with no manuscript length restrictions and more welcoming editors and peers, to be able to put in everything that we had. Some of my best bits are buried in journals where you'd never expect them!

But it was increasingly annoying, nevertheless;. It was no fun anymore to formally publish research, and so I let my career as smoothly run out in the 2010s as it started in the Zeroes.

David's encouraging me to write blog-posts, just after I early retired, thus revitalized my interest in science, to "boldly go where no-one has gone before". The amount of effort is typically much lower, although some of my posts do involve the same work that I put into the papers that I co-authored. More importantly, there are no beasts in the World Wide Web that can bite you from the shadows; they have to do it in the open. It's an ideal way to get an idea out, without having to think about the consequences. None of the work I put into a post has been for vain. What a difference: before, for every graph / analysis result published, two ended in the bin, many devoured by the Mighty Beasts.

And, maybe somebody will find the work interesting enough to try it out; and eventually my idea finds a place in the sanctionized, peer-reviewed scientific world, anyway. Since I'm out-of-business, I can afford to not cash in the credit (no-one formally cites a blog post).

My last Neighbor-net for the Genealogical World

Neighbor-nets (NNets) and myself was love at the first sight (this was, in my case, ~2005, when my boss Vera Hemleben, a geneticist, sent me over to the new professor in our bioinformatics department, named Daniel Huson, who had just released a new software package, SplitsTree). These networks are...
  • ... most versatile: any kind of data can be transformed into a distance matrix;
  • ... quick-and-easy to infer.
And even if they are not phylogenetic networks in the strict sense – NNets are unrooted and their edge-bundles do not necessarily reflect evolutionary pathways – they more often than not point towards common origins and down-scale ± complex phylogenetic relationships more comprehensively than any phylogenetic tree (coalescent or not) that we could infer. The Genealogical World is full of examples, and the writers of this blog such as David [homepage], Mattis [GoogleScholar/ homepage], myself [GoogleScholar/ homepage], and like-minded researches have published quite a few of them (in high- and low-impact journals). For a comprehensive, permanently updated list see Philippe Gambette's Who's Who in Phylogenetic Networks page.

For my final post, I decided on a fascinating new data source in paleobiology: Fourier transformed infrared spectra (FTIR) of fossil cuticles.

The cuticle is a plant's skin, and it's composition and structure show a lot of variation, down to species level. Thus, their morphological-anatomical features have long been used as taxonomic markers to identify fossil material. Using infrared spectroscopy, one can look at the chemical composition of cuticles. Like any other spectrum, an FTIR-spectrum can be broken down in sets of quantitative (discrete, binned) or qualitative (continuous) characters; and one can then create a dissimilarity matrix for the investigated material. This is what Vajda, Pucetaite et al. (Nature Ecol. Evol. 1: 1093–1099, 2017) did for long-death (Mesozoic) but enigmatic seed plants and their equally enigmatic modern counterparts.

A UPGMA dendrogram based on FTIR data of fossil taxa (Vajda et al. 2017, fig. 4). Brackets to the right give the topology of the UPGMA dendrogram including extant material and data (Vajda et al. 2017, fig. 3).
PCA plots of the first and second (a), and first and third (b) coordinates, with the main seed plant lineages indicated (modified after Vajda et al. 2017, suppl.-fig. 4)

PCA and UPGMA are not phylogenetic inference methods, but there is obviously some phylogenetic signal encoded in these FTIR spectra, as shown above.

When I first saw the paper, I contacted the authors (including former colleagues of mine at Naturhistoriska riksmuseet in Stockholm), and the first author gave the second author, Milda Pucetaite (a Ph.D. student), a green light to share and convert her FTIR data into a simple distance matrix for me to run a NNet, as shown below.

Neighbor-net based on the combined distance matrix provided by Milda (pers. comm. July 2017).


Note that this NNet is a partly impossible graph, phylogenetically. The chemical composition naturally changes after the foliage (in this case) gets buried in sediment, and its cuticle is then conserved for millions of years by various taphonomic and diagenetic processes. As pointed out by the experienced biochemist among the authors during our correspondence: it is hence pointless to combine the data from extant and extinct taxa.

Well, since this is a post and not a paper, I combined them anyway. I find the result quite compelling, supporting the paper's conclusions including more speculative follow-up ones. The NNet reflects every aspect that these kind of data can provide for phylogenetic and systematic purposes.

The prominent central edge bundle reflects the taphonomic-diagenetic change separating the living from fossil samples. The basic sequence within the subgraphs is the same: gingkoes are closest to cycads, and cycads bridge to Araucariaceae, which is a relict lineage of the "needle" trees, the conifers (many of which don't have needles but leaves). Bennettitales and Nilssoniales are extinct groups of seed plants, which are here resolved as a distinct lineage. Especially, the Bennettitales have been have long puzzled scientists. They may represent a third major lineage of seed plants that are neither angiosperms (flowering plants) nor gymnosperms (ginkgoes, cycads, conifers, gnetids), or perhaps an early side lineage of either one (or lineages, as their two main groups are quite different).

As for pretty much any kind of data, just try it out for yourself. This is exploratory data analysis (EDA), particularly useful to get a first, fast impression of the primary signal in your data. This is true even if you keep it to yourself, having to watch out for the Mighty Beasts of the Forest of Reviews (especially the ones that call themselves "cladists"). Who are quick in telling you, what you can't do, but not so straightforward, when it comes pointing you to other options for analyzing your data.



My dive-in list for some more (im-)possible NNets
With David retiring, the Genealogical Worlds of Phylogenetic Networks will fall dormant, the next and final post will be a farewell from David. Like Mattis (Von Wörtern und Bäumen), I will keep on science-blogging (in spite of the new buggy Blogger-editing interface forcing me to draft directly in HTML) for a little while (and irregularly) on my Res.I.P. blog, which also includes a tag for "phylo-networks" for any future NNets and the like.

Monday, October 19, 2020

Xenoplasy

A major obstacle in studying morphological evolution is homoplasy. This occurs when the same (or similar) traits are evolved independently in different lineages (convergences), and are positively selected for or incompletely sorted within a lineage (parallelisms, homoiologies). Traits that not sort following the true tree create incompatible signal patterns, and, eventually, topological ambiguity. No matter which inference method we use, we end up with several alternative trees that combine aspects of the true tree with artificial branching patterns.

Homoplasy is the rule, while trait sorting is the exception. Consequently, we have to expect that any morphology-based tree will have more wrong branches than correct ones.

For extant group of organisms, a simple solution to the problem is to analyse morphological traits in the framework of a molecular phylogeny. The genetic data provides us with an independent, best-possible tree. By mapping the morphological traits on this tree, we can evaluate their potency as phylogenetic markers.

But what if our group of organisms is not the product of a simple repeated dichotomous splitting pattern? What if there were anastomoses as well? That is, the morphological traits are not the product of mere (incomplete) lineage or incomplete gene sorting (the latter is called "hemiplasy") but fusion of traits in different lineages. Thus, a tree is not enough to explain the genetic data? What does this imply for the morphological differentiation we observe?

Take the London Plane (Platanus x acerifolia or P. x hispanica), for example, which is a tree that many of us are familiar with. In case you don't know the name: they are the large trees with a patterned bark and deeply lobed leaves and fluffy fruiting bodies found in abundance in parks and alleys throughout the world. It's a cultivation-hybrid (17th—18th century) of the North American plane tree, Platanus occidentalis, and its distant eastern Mediterranean relative, P. orientalis. These are genetically and morphologically distinct species. Their history is summarized in the following doodle (Grimm & Denk 2010).

Each line represents a semi-sorted nuclear gene region. The split between proto-PNA-E (SW. U.S., NW. Mexico, E. Mediterranean) and proto-ANA (Atlantic-facing Central America, E. U.S.) must have been > 12 myrs ago (last Platanus of Iceland). The minimum air distance between the sister species P. orientalis and P. racemosa is ~11,500 km (via the Arctic). Interestingly, fossils from that time and later (including western Eurasia) have more ANA-clade morphologies: P. orientalis- and P. racemosa-types pop up ~5 Ma. Both ANA and PNA-E clade have distinct morphologies. With respect to the individual gene trees, those exclusively shared by P. palmeri and P. rzedowski with P. occidentalis s.str. and P. mexicana of the ANA clade could be adressed as "hemiplasies".

If you look at the leaves and fruits of London Planes, you can find everything in between the two endpoints; and the same holds for their genetics. The London Plane is much hardier than Europe's own P. orientalis and more drought-resistant than its hardier North American parent. With climate change going on, the hybrid will eventually meld with the European species entirely. And, thanks to what we call "hybrid vigor", given a few millions of years, it might consume its other parent, too. London Planes have been re-introduced into the Americas; and P. orientalis has become an invasive species in California, where it has started to hybridize with its local sister species P. racemosa. Now imagine a future researcher of Platanus evolution having to deal with a highly complex accumulation of Platanus fossils in the Northern Hemisphere, while being able to study only the left-over complex genetics of a single species that replaced two.

This is where a recently coined new concept comes in: xenoplasy.
Yaxuan Wang, Zhen Cao, Huw A. Ogilvie, Luay Nakhleh (2020). Phylogenomic assessment of the role ofhybridization and introgression in trait evolution. bioRxiv doi: 10.1101/2020.09.16.300343
Xenoplasies are traits that originate from hybridization and subsequent introgression. In standard phylogenetics, they would act like any homoplasious character, but their distinction is that they are not independently involved. They are captured via lineage crossing, and reflect a common ancestry.

Example for a trait incongruent with the species tree, representing a xenoplasy obtained by introgression of I1-A lineage which evolved the trait into I3-B lineage, part of the I2 clade. Pending how far they are affected by incomplete lineage sorting (ILS) and introgression, individual gene may result in any of the three possible genealogies. Modified after Wang et al. (2020), fig. 1.

As such, their phylogenetic weight (information content) equals that of the anyhow rare classic autapomorphies or synapomorphies (fide Hennig), and this weight is higher than that of the more common homoiologies, shared apomorphies or symplesiomorphies. Note, in the palaeozoological cladistic literature, sorted versions of the latter three are often called synapomorphies – any lineage-specific, derived trait ("synapomorphy") may be lost / modified in some sublineage(s), or rarely pop-up outside the lineage.

Wang et al. provide an analytical framework for identifying a trait as xenoplasy, and assessing the probability for it ("xenoplasy risk factor"). If you're interested in the mechanics, check out the pre-print. The mathematical part of my brain has been dormant for most of the last two decades (when I exchanged chemistry for geology-biology), so I'm more into possible applications to explore this new concept.

Where to look next

The Wang et al. real-world example (Jaltomata) is, however, not very appealing. The problem is that, to look for xenoplasy, we need data that requires us to infer an explicit phylogenetic network (in the strict sense) to start with. In addition, we could use a morphological partition: scored morphological traits; which is usually absent. Last, identifying xenoplasies would make most sense for traits that can be traced in the fossil record, not only to identify potential products of past reticulation but have a better grip on placing critical fossils. Often overlooked by neontologists, fossils are the only physical proof that a lineage was at a certain place at a certain point in time. So, here's two examples: beeches and bears.

Beeches are a small genus of extra-tropical angiosperm trees with a pretty well understood fossil record. Morphologically, their differentiation is very hard to put into a tree, as shown here.

A morpholgy-based Neigbor-net of fossil (open circles) and extant beech (closed circles) taxa. Coloration gives the (paleo-)geographic distribution (abbreviated as three letters). For more background and information see my Res.I.P. post: The challenging and puzzling ordinary beech – a (hi)story

Mapping species-discriminating traits on a tree would be of little help here, because the modern species are the product of recurrent phases of mixing and incomplete sorting. I have summarized  this in the following doodle, depicting the diversification and propagation of 5S-IGS variants (a non-transcribed, poly-copy, multi-array intergenic nuclear spacer) in a still very small sample.

A doodle summarizing differentiation patterns in a sample of 686 "representative" 5S-IGS variants obtained using high-throughput sequencing of six beech populations of western Eurasia and Japan (Simone Cardoni et al., to be submitted in the near future; see Piredda et al. 2020 for a similar analytical set-up).

The people involved in researching this project (drawn by passion rather than resources) don't have the resources to generate the NGS data needed to construct a species network for all of the species of beech, like Wang et al.'s Jaltomata data. But given that there are only 9–10 species, it would be easy prey for a well-funded research group. If you are interested, but don't know how to get the material and are unfamiliar with beeches, feel free to contact the senior author of Piredda et al. 2020, Marco Simeone — new beech-enthusiasts are always welcomed by this group.

Bears are one of the best-studied extant mammal predators, and they also have a decent fossil record. This is probably the reason that Heath et al. (2014) used bears as the case study when introducing their new molecular dating approach: the fossilized-birth-death dating.

A fossilized birth-death dated tree of bears (modified from Heath et al. 2014, fig. 4). The numbers in brackets give the number of fossil taxa (extinct genera, Ursus spp.) listed on Wikipedia.

As nice as it looks (and done), their analysis is pretty flawed from an evolutionary point of view. Their dated tree only reflects a single aspect of bear evolution and may involve branch-length artifacts. Heath et al. relied on complete mitochondrial genomes, which they combined with a single nuclear protein-coding gene. Mitochondrial genes reflect only the maternal lineage; they did not date a species tree but a mitochondrial genealogy. Paternal and biparentally inherited gene markers (which includes nuclear genes) tell very different stories about species relationships (this is why we also used the bears as example data for Schliep et al. 2017).

Strict, branch-length ignorant Consensus network of three trees inferred using species-consensus sequences generated from three sets of data: biparentally inherited nuclear-encoded autosomal introns (ncAI), paternally inherited Y-chromosomes (YCh) and maternally inherited mitochondrial genes (complete set; mtG). This is clearly not the product of a strictly dichotomous evolution. Thick lines: edges found in Heath et al.'s chronogram (= mitochondrial genealogy).

And while it may be that morphology reflects more the maternal than the paternal side, it has never been tested. Neither how morphology fits with the coalescent species tree. Which would be a network, as shown below.

Gene flow in bears within the last 5 myrs (estimate; from Kumar et al. 2017).

How Heath et al. linked the fossils to clades might have been just as wrong as it was right (note that FBD dating is much less biased by mis- or unoptimal placed fossils than traditional node dating). Hemi- and xenoplasy must be considered here. In addition to the highly incongruent paternal and maternal genealogies, we know that even the morphologically most distinct sister species (grizzlies, a special form of Brown Bear, and polar bears) can produce vital offspring ("Grolar") with morphological traits from either side of the family (usually, the Grizzly-side dominates).

Wildlife services usually kill these hybrids as they are considered to speed up the decline of polar bears (they are food competitors). However, with the (possibly inevitable) melting of the polar caps, these hybrids could be instrumental in the survival of a bit of Polar Bear legacy, in the form of genetic diversity not found in brown bears, and xenoplasies. If two highly distinct bear species hybridize today in the wild due to (in this case: human-induced) environmental pressure, their ancestors probably have done so in the past in reaction to shifting habitats and migration patterns.

Given how long bears have intrigued researchers, there are plenty of classic morphological studies involving fossils; and, in the light of the vast amount of molecular data (including ancient DNA!) that have been collected for bears, it should be pretty easy to apply Wang et al.'s new approach to bears. For example, is the Cave Bear a dead-end side lineage, intrograde or hybrid dead-end? Mitochondrial-wise Cave bears are placed as sister to Brown and Polar bears but that's just because of their provenance. Like chloroplast genealogies in plants, mitochondrial genealogies in animals typically show a strong geographic correlation. Especially in bears, the mothers and daughters don't migrate as much as the fathers and sons.

Mitochondrial genealogy of bears including Cave bears (Kumar et al. 2017, fig. 3), the famous European bears of the Ice Ages. ABC bears are insular brown bears living on the subarctic Admirality, Baranof and Chichagof islands of the Alexander archipelago known as natural example for gene flow between Brown and Polar bears (Kumar et al. 2017, fig. 1, provides a map of current distribution of bears).

Postscriptum

Birds are another animal group that likes to diversify into many species, some of which love to transgress recently established species barriers, forming hybrid swarms. These are actually dinosaurs, a group exclusively studied using cladistic analyses of morphological traits providing non-tree-like signals — mostly homoplasies, a lot of not-really-synapomorphies (good deal are probably homoiologies), and, it wouldn't surprise me, one or another xenoplasy. Or can we assume they were much to advanced to hybridize and intrograde?

Cited literature
  • Grimm GW, Denk T. 2010. The reticulate origin of modern plane trees (Platanus, Platanaceae) - a nuclear marker puzzle. Taxon 59:134–147.
  • Heath TA, Huelsenbeck JP, Stadler T. 2014. The fossilized birth–death process for coherent calibration of divergence-time estimates. PNAS 111:E2957–E2966.
  • Kumar V, Lammers F, Bidon T, Pfenninger M, Kolter L, Nilsson MA, Janke A. 2017. The evolutionary history of bears is characterized by gene flow across species. Scientific Reports 7:46487 [e-pub].
  • Piredda R, Grimm GW, Schulze E-D, Denk T, Simeone MC. 2020. High-throughput sequencing of 5S-IGS in oaks: Exploring intragenomic variation and algorithms to recognize target species in pure and mixed samples. Molecular Ecology Resources doi:10.1111/1755-0998.13264.
  • Schliep K, Potts AJ, Morrison DA, Grimm GW. 2017. Intertwining phylogenetic trees and networks. Methods in Ecology and Evolution 8:1212–1220.

Monday, October 12, 2020

Tattoo Monday XXI


There are a number of tattoo designs that incorporate the concept of a Tree of Life with the concept of DNA. A selection of these was included in the previous post, Tattoo Monday XX. Here are a few more.


Monday, October 5, 2020

Rogue dinosaurs, an example from the Aetosauria


In several earlier posts (a non-comprehensive link list can be found at the end of the post), I outlined how networks, tree-sample (Consensus networks, SuperNetworks) or distance-based (Neighbor-nets) may be of practical help, especially when we study phylogenetic relationships of extinct organisms.

In this post, I will further explore this by looking at a matrix for Aetosauria (Parker 2016, PeerJ) that provides an overall (relatively) strong and unambiguous signal. [NB: The reason, I prefer to use PeerJ papers as examples is that it is one of the very few journals that is open access and has a strict open data policy — to publish there, authors have to give access to the used data.]

In the abstract of the original paper, we read the following:
Nonetheless, aetosaur phylogenetic relationships are still poorly understood, owing to an overreliance on osteoderm characters, which are often poorly constructed and suspected to be highly homoplastic. A new phylogenetic analysis of the Aetosauria, comprising 27 taxa and 83 characters, includes more than 40 new characters that focus on better sampling the cranial and endoskeletal regions, and represents the most comprehensive phylogeny of the clade to date. Parsimony analysis recovered three most parsimonious trees; the strict consensus of these trees finds an Aetosauria that is divided into two main clades: Desmatosuchia, which includes the Desmatosuchinae and the Stagonolepidinae, and Aetosaurinae, which includes the Typothoracinae.
Parker's (2016) fig. 6 shows the results of the "initial analysis" (click to enlarge, colored annotations added by me).

Systematic groups based on clades are abbreviated (see next graph for full names).

A is a "Strict component consensus" of the 30 inferred MPTs (most parsimonious trees), B the Adams consensus. C the Majority rule consensus, branch labels give percentages for branches not found in all MPTs. D a "Maximum agreement subtree after a priori pruning of one taxon (black star) within the upper clade.

Parker's (2016) fig. 7 then shows the preferred result: a "reduced strict consensus of 3 MPTs" with the red star taxon removed, and (rarely seen in dinosaur phylogeny papers) branch-support — including Bootstrap support values below 70, which are very rarely reported in the literature (from my own experience it seems that editors of systematic biology journals don't like them).


Removal of one rogue taxon (called a "wildcard" in paleozoology), Aetobarbakinoides brasiliensis, substantially reduced the number of MPTs. Nonetheless, many branches have low support, and hence also the clades (used here as synonym for monophyla) derived from them – Parker uses branch-based ("stem"-based, brackets on his tree), and node-based taxa (dots).

Low branch support may or may not matter

There are two possible reasons for low branch-support:
  • non-discriminatory signal: any alternative branching pattern receives diminishing support
  • internal signal conflict: two (or more) alternatives receive similar support.
Mapping the support on the preferred (inferred) optimal tree cannot tell us whether it's the one or the other — only Support consensus networks can visualize this. Since we are interested in the rogue, I re-ran the parsimony BS analysis (10,000 quick-and-dirty replicates, following Müller 2005, BMC Evol. Biol. 5:58) including Aetobarbakinoides brasiliensis.

Support consensus network based on 10,000 parsimony BS pseudoreplicates. Trivial splits collapsed, only splits are shown the occured in at least 20% of the BS replicates.

The decreased/low BS support within the most terminal (root-distant) subtrees, the Des'ini and Par'ini, relates to conflicting alternatives involving one or two OTUs. In the case of Des'ini, it is the affinity of Lucasuchus and NCSM 21723, while in the case of Par'ini an alternative (recognizing Tecovasuchus as sister to the remainder) is found in 1 out of three BS pseudoreplicate trees. The diminishing support for basal relationships (root-proximal branches/edges) is due to the general lack of discriminatory signal (BS any alternative < 25). However, there are very few situations in which the best-supported alternative differs much from that in the preferred tree. For instance, any alternative to a Stag'inae sister relationship has even less than BS = 24 (BS = 27 in Parker's "reduced" tree).

Our rogue, however, is not really a 'wildcard'. The scored characters simply put it much closer to the outgroup than is any other ingroup taxon. A simple explanation could be that it is a most primitive (least derived) member of the Aetosauria. Another possibility is that it lacks any critical trait needed to place it within the ingroup. Since the deep splits within the Aetosauria rely on very few character changes, we can put it in different position down here and the tree will still have the same number of inferred changes.

Trivial and non-trivial taxa

The cladograms typically shown provide limited information about the signal in the underlying matrix, its strength and weaknesses, even when not "naked" but annotated using branch-support values. Given that there are no severe overlap gaps in the data, a very quick alternative is the Neighbor-net (a necessary addition, in my opinion).

Bold edges correspond to branches (hence: clades) in Parker's preferred tree.

Using this, we can directly depict which groups, potential clades, draw substantial (partly trivial) character support.

For instance, according to Parker's tree and following cladistic classification, Stagonolepis is an invalid taxon: one species (St. robertsoni) is part of the Stag'inae clade, the other (St. olenki) is of the Des'inae clade. Character support is, however, nearly non-existent (Bremer value = 1 and BS = 7 in the original analysis; BS ≤ 20 for any competing alternative in our re-analysis). The distance network shows us why — indeed, both species are closest to each other; but, while St. robertsoni shares a critical Stag'inae character suite and, consequently, shows the highest similarity to Polesinuchus, St. olenki does not share this (note the lack of a corresponding neighborhood). Furthermore, any alternative placement fits even less. Parker's tree only resolved it at sister to all other Des'inae because it didn't fit into any of the well-supported, terminal clades (prominent edge-bundles).

We can also see where we may have to deal with internal signal conflict, and how this may affect the tree inference and lead to ambiguous branch support. Take, for instance, the NCSM 21723 individual (= Gorgetosuchus pekinensis). It's clearly a Des'inae. The reason, we have ambiguous branch support for this staircase-like subtree is that NCSM 21723 is substantially more similar to the distant, equally evolved sister lineage, the Par'ini (purple edge bundle). Hence, it must be placed as sister to all other Des'inae, although it appears to represent a more derived form than Longosuchus, representing the next step towards the most-derived crown-taxon Desmatosuchus. Tecovachus is the source of topological conflict within the Par'ini because it is the least-derived taxon. Its primitiveness will be expressed by placing it as sister to all other Par'ini, while few shared, non-exclusive apomorphies are behind its position in the preferred tree (Bremer value = 1, BS = 48 in Parker's fig. 7).

While it is obvious that the matrix has no clear tree-like signal for resolving any OTU that is not part of the terminal Des'ini and Typ'inae lineages, our 'wildcard' (Aetobarkinoides) is particularly close to the outgroup while showing no affinity to anything else. If it is part of the ingroup, it represents the ancestral form, ie. shows a character suite that is primitive (derived traits may be missing because they are simply not preserved: see description of the taxon in Parker 2016). This is the reason why it acted rogue-ish in tree inferences even though it's favored phylogenetic position is clear.

Data

Parker's original matrix can be found in the supplement to the paper. An annotated ready-to-use NEXUS-formatted version (including my standard codelines for parsimony and distance bootstrapping) and the inference results used here can be found in this figshare submission, which I generated for a technical Q&A.



Here is the promised list of previous posts dealing with fossils and networks.