This is likely to be my last post for this blog.
When I joined the Genealogical World of Phylogenetic Networks three years ago, I didn't know how much fun it is to blog about science. Blogging, or writing essays, has several advantages against the traditional way to get a researcher's ideas out into the world — writing a scientific paper. The most important one is, one can just try out something without having to worry how this would get past the peer-reviewers and editors (or as I like to call them: the Mighty Beasts lurking in the Forest of Reviews). When I was still a (sort-of) career scientist (ie. paid by tax-payers to do science), I had my share of discouraging experiences, whenever we tried to leave the beaten (and worn out) paths to try something new; to look into the dark places and not right under the street-lights.
Before we submitted papers, we hence put a considerable effort into them, pondering what our peers may criticize, or what might alienate them (being likely unfamiliar with our methodological and philosophical approaches), and thus to minimize the chance all our work would be for nothing. In a couple of cases, where we expected fierce resistance, we opted for low-impact journals with no manuscript length restrictions and more welcoming editors and peers, to be able to put in everything that we had. Some of my best bits are buried in journals where you'd never expect them!
But it was increasingly annoying, nevertheless;. It was no fun anymore to formally publish research, and so I let my career as smoothly run out in the 2010s as it started in the Zeroes.
David's encouraging me to write blog-posts, just after I early retired, thus revitalized my interest in science, to "boldly go where no-one has gone before". The amount of effort is typically much lower, although some of my posts do involve the same work that I put into the papers that I co-authored. More importantly, there are no beasts in the World Wide Web that can bite you from the shadows; they have to do it in the open. It's an ideal way to get an idea out, without having to think about the consequences. None of the work I put into a post has been for vain. What a difference: before, for every graph / analysis result published, two ended in the bin, many devoured by the Mighty Beasts.
And, maybe somebody will find the work interesting enough to try it out; and eventually my idea finds a place in the sanctionized, peer-reviewed scientific world, anyway. Since I'm out-of-business, I can afford to not cash in the credit (no-one formally cites a blog post).
My last Neighbor-net for the Genealogical World
Neighbor-nets (NNets) and myself was love at the first sight (this was, in my case, ~2005, when my boss Vera Hemleben, a geneticist, sent me over to the new professor in our bioinformatics department, named Daniel Huson, who had just released a new software package, SplitsTree). These networks are...
- ... most versatile: any kind of data can be transformed into a distance matrix;
- ... quick-and-easy to infer.
For my final post, I decided on a fascinating new data source in paleobiology: Fourier transformed infrared spectra (FTIR) of fossil cuticles.
The cuticle is a plant's skin, and it's composition and structure show a lot of variation, down to species level. Thus, their morphological-anatomical features have long been used as taxonomic markers to identify fossil material. Using infrared spectroscopy, one can look at the chemical composition of cuticles. Like any other spectrum, an FTIR-spectrum can be broken down in sets of quantitative (discrete, binned) or qualitative (continuous) characters; and one can then create a dissimilarity matrix for the investigated material. This is what Vajda, Pucetaite et al. (Nature Ecol. Evol. 1: 1093–1099, 2017) did for long-death (Mesozoic) but enigmatic seed plants and their equally enigmatic modern counterparts.
PCA and UPGMA are not phylogenetic inference methods, but there is obviously some phylogenetic signal encoded in these FTIR spectra, as shown above.
When I first saw the paper, I contacted the authors (including former colleagues of mine at Naturhistoriska riksmuseet in Stockholm), and the first author gave the second author, Milda Pucetaite (a Ph.D. student), a green light to share and convert her FTIR data into a simple distance matrix for me to run a NNet, as shown below.
|Neighbor-net based on the combined distance matrix provided by Milda (pers. comm. July 2017).|
Note that this NNet is a partly impossible graph, phylogenetically. The chemical composition naturally changes after the foliage (in this case) gets buried in sediment, and its cuticle is then conserved for millions of years by various taphonomic and diagenetic processes. As pointed out by the experienced biochemist among the authors during our correspondence: it is hence pointless to combine the data from extant and extinct taxa.
Well, since this is a post and not a paper, I combined them anyway. I find the result quite compelling, supporting the paper's conclusions including more speculative follow-up ones. The NNet reflects every aspect that these kind of data can provide for phylogenetic and systematic purposes.
The prominent central edge bundle reflects the taphonomic-diagenetic change separating the living from fossil samples. The basic sequence within the subgraphs is the same: gingkoes are closest to cycads, and cycads bridge to Araucariaceae, which is a relict lineage of the "needle" trees, the conifers (many of which don't have needles but leaves). Bennettitales and Nilssoniales are extinct groups of seed plants, which are here resolved as a distinct lineage. Especially, the Bennettitales have been have long puzzled scientists. They may represent a third major lineage of seed plants that are neither angiosperms (flowering plants) nor gymnosperms (ginkgoes, cycads, conifers, gnetids), or perhaps an early side lineage of either one (or lineages, as their two main groups are quite different).
As for pretty much any kind of data, just try it out for yourself. This is exploratory data analysis (EDA), particularly useful to get a first, fast impression of the primary signal in your data. This is true even if you keep it to yourself, having to watch out for the Mighty Beasts of the Forest of Reviews (especially the ones that call themselves "cladists"). Who are quick in telling you, what you can't do, but not so straightforward, when it comes pointing you to other options for analyzing your data.
My dive-in list for some more (im-)possible NNets
- "Man gave name to all those animals": cats and dogs — a joint post with Mattis, where we just mapped the names on a NNet of world languages. Cormac Anderson joined for the second part dealing with goats and sheep.
- Should we try to infer trees on tree-unlikely matrices — my first post for the Genealogical World of Phylogenetic Networks about a NNet that, at the time I made it, was impossible to publish.
- Stacking networks based on sign language manual alphabets — NNets based on hand-shapes historically used for sign letters.
- To boldy go where no one has gone before – networks of moons — a joint post with Timothy Holt, who scored celestial bodies of our stellar system for classification.
- Visualizing U.S. gun laws — inferring NNets based on serious issues can be fun. Related post: The 2nd Amendment does more than keep King George away.