Monday, January 27, 2020

From words to deeds?

If you want to annoy a linguist, then there are three easy ways to do so: ask them how many languages they speak; ask them for their opinion regarding the German spelling reform; or ask them whether it is true that the Eskimo language has 50 words for snow. What those three questions have in common is that they all touch upon some big issues in linguistics, which are so big that they give us a headache when being reminded of them.

For the first question, asking about a linguist's linguistic talent touches upon the conviction of quite a few linguists that in order to practice linguistics, one does not need to study many languages. One language is usually enough; and even if that language is only English, this may also be sufficient (at least according to some fanatics who practice syntax). To put it in different words: knowing only one language does not prevent a linguist from making claims about the evolution of whole language families. Knowing how to describe a language, or how to compare several languages, does not necessarily require anyone to be able to speak them. After all, mathematicians also pride themselves on not being able to calculate.

The second question, regarding the German spelling reform, marks the last time when German linguists failed royally in proving the importance of their studies to the broader public. The problem was that the German spelling reform, the first after some 100 years of linguistic peace, was mostly done without any linguistic input. Those who commented on it were, instead, novelists, poets and journalists, usually a bit older in age, who felt that the reform was proposed mainly in order to annoy them personally. At the same time, and this was maybe no coincidence, more and more institutes for comparative linguistics disappeared from German universities. The reason was again that the field had not succeeded in explaining its importance to the public. However, historical language comparison can, indeed, be important when discussing the reform of a writing system that is being used by millions of people, specifically also because the investigation of historically evolving linguistic systems is one of the specialties of historical-comparative linguistics. This was completely ignored by then.

The last question concerns the almost ancient debate about the hypothesis commonly known attributed to Edward Sapir (1884-1939) and Benjamin Lee Whorf (1897-1941). This says, in its strong form (Whorf 1950), that speaking influences thinking to such an extent that we might, for example, develop a different kind of Relativity Theory in physics if we started to practice our science in languages different from English, French, and German. Given that Eskimo languages are said to have some 50 different words for snow (as people keep repeating), it should be clear enough that those speaking an Eskimo language must think completely differently from those who start to forget what snow is after all.

The latter concept leads to an interesting use of networks, which I will discuss here.

Words versus deeds

The hypothesis by Sapir and Whorf annoys many linguists (including myself), because it has been long since disproved, at least in its strong, naive form. It was disproved by linguistic data, not by arguments; and the data were the data used by Whorf in order to prove his point in a first instance. However, although there is little evidence for the hypothesis in its strong form, people keep repeating it, especially in non-linguistic circles, where it is often instrumentalized.

Whether we can find evidence for a weak form of the hypothesis — which would say that we can find some influence of speaking on thinking — is another question; which is, however, difficult to answer. It may well be possible that our thoughts are channeled to some degree by the material we use in order to express them. When distinguishing color shades, for example, such as light blue and dark blue, by distinct words, such as goluboj and sin'ij in Russian or celeste and azul in Spanish, it may be that we develop different thoughts when somebody talks about blue cheese, which is called dark blue cheese in Spanish (queso azul).

But this does not mean that somebody who speaks English would never know that there is some difference between light and dark blue, just because the language does not primarily make the distinction between the two color tones. It is possible that the stricter distinction in Russian and Spanish triggers an increased attention among speakers, but we do not know how large the underlying effect is in the end, and how many people would be affected by it.

Particular languages are thus neither a template nor a mirror of human thinking — they do not necessarily channel our thoughts, and may only provide small hints as to how we perceive things around us. For example, if a language expresses different concepts, such as "arm" and "hand" with the same word, this may be a hint that "arm" and "hand" are not that different from each other, or that they belong together functionally in some sense, which is why we may perceive them as a unit. This is the case in Russian, where we find only one expression ruka for both concepts. In daily conversations, this works pretty well, and there are rarely any situations where Russian speakers would not understand each other due to ambiguities, since most of the time the context in which people speak disambiguates all they want to express well enough.

Colexification network with the central concept "MIND" and the geographical distribution of languages colexifying "MIND" and "BRAIN"

These colexifications, as we now call the phenomenon (François2008), occur frequently in the languages of the world. This is due to the polysemy of many of the words we use, since no single word denotes only one concept alone, but often denote several similar concepts at the same time. On the other hand, we encounter identical word forms in the same language which express completely different things, resulting from coincidental processes by which originally different pronunciations came to sound alike (called convergence, in biology). Those colexifications that are not coincidental but result from polysemy are the most interesting ones for linguists, not least because the words are related by network graphs not trees (as shown above). When assembled in large enough numbers, across a sufficiently large sample of languages, they may allow us some interesting insights into human cognition.

The procedure to mine these insights from cross-linguistic data has already been discussed in a previous blog, from 2018. The main idea is to collect colexifications for as many concepts and languages and possible, in order to construct a colexification network, in which each concept is represented by a node, and weighted links between the nodes represent how often each colexification between the linked concepts occurs; that is, they represent how often we find a language that expresses the two linked concepts with the same word.

Having proposed a first update of our Database of Cross-Linguistic Colexifications (CLICS) back in 2018, we have now been able to further increase the data. With this third installment of the database, we could double the number of language varieties, from 1,200 to 2,400. In addition, we could enhance the workflows that we use to aggregate data from different sources, in a rigorously reproducible way (Rzymski et al. 2020).

Current work

Even more interesting than these data, however, is a study initiated by colleagues from psychology from the University of North Carolina, which was recently published, after more than two years of intensive collaboration (Jackson et al. 2019). In this study, the colexifications for emotion concepts, such as "love", "pity", "surprise", and "fear", were assembled and the resulting networks were statistically compared across different language families. The surprising result was that the structures of the networks differed quite considerably from each other (an effect that we could not find for color concepts derived from the same data). Some language families, for example, tend to colexify "surprise" and "fear (fright)" (see our subgraph for "surprised"), while others colexified "love" and "pity" (see the subgraph for "pity").

Not all aspects of the network structures were different. An additional analysis involving informants showed that especially the criterion of valency (that is, if something is perceived as negative or positive) played an important role for the structure of the networks; and similar effects could be found for the degree of arousal.

These results show that the way in which we express emotion concepts in our languages is, on the one hand, strongly influenced by cultural factors, while on the other hand there are some cognitive aspects that seem to be reflected similarly across all languages.

What we cannot conclude from the results, however, is, that those, who speak languages in which "pity" and "love" are represented by the same word, will not know the difference between the two emotions. Here again, it is important to emphasize, what I mentioned above with respect to color terms: if a particular distinction is not present in a given language, this it does not mean that the speakers do not know the difference.

It may be tempting to dig out the old hypothesis of Sapir and Whorf in the context of the study on emotions; but the results do not, by any means, provide evidence that our thinking is directly shaped and restricted by the languages we speak. Many factors influence how we think. Language is one aspect among many others. Instead of focusing too much on the question as to which languages we speak, we may want to focus on how we speak the language in which we want to express our thoughts.


François, Alexandre (2008) Semantic maps and the typology of colexification: intertwining polysemous networks across languages. In: Vanhove, Martine (ed.): From polysemy to semantic change. Amsterdam: Benjamins, pp. 163-215.

Joshua Conrad Jackson, Joseph Watts, Teague R. Henry, Johann-Mattis List, Peter J. Mucha, Robert Forkel, Simon J. Greenhill and Kristen Lindquist (2019) Emotion semantics show both cultural variation and universal structure. Science 366.6472: 1517-1522. DOI: 10.1126/science.aaw8160

Rzymski, Christoph, Tiago Tresoldi, Simon Greenhill, Mei-Shin Wu, Nathanael E. Schweikhard, Maria Koptjevskaja-Tamm, Volker Gast, Timotheus A. Bodt, Abbie Hantgan, Gereon A. Kaiping, Sophie Chang, Yunfan Lai, Natalia Morozova, Heini Arjava, Nataliia Hübler, Ezequiel Koile, Steve Pepper, Mariann Proos, Briana Van Epps, Ingrid Blanco, Carolin Hundt, Sergei Monakhov, Kristina Pianykh, Sallona Ramesh, Russell D. Gray, Robert Forkel and Johann-Mattis List (2020): The Database of Cross-Linguistic Colexifications, reproducible analysis of cross- linguistic polysemies. Scientific Data 7.13: 1-12. DOI: 10.1038/s41597-019-0341-x

Benjamin Lee Whorf (1950) An American Indian Model of the Universe. International Journal of American Linguistics 16.2: 67-72.

Monday, January 20, 2020

Worldwide gender differences in amount of paid versus unpaid work

A few weeks ago, I wrote about National differences in the amount of paid and unpaid work. This involved a look at the time that people spend per day on each of various different activities, averaged across each year. The data came from the time-use surveys conducted by the Organisation for Economic Co-operation and Development (OECD) for its 30 member countries. I concluded that there are many similarities among countries that share strong cultural ties, although some countries stand out as unusual within this context.

Four main categories of time use are reported in the surveys: Paid Work or Study, Unpaid Work, Personal Care, and Leisure Time; these are described in more detail in my previous post. The aggregated results for each country are available online, including data for three non-OECD countries, for comparison (China, India, South Africa).

Of particular interest is that these data are actually aggregated separately for males and females (see Balancing paid work, unpaid work and leisure). This allows us to look at the various national time-management behaviors in the light of potential differences in gender roles within those countries.

Obviously, we expect some consistent gender differences, not least because in most cultures it is the females who have traditionally been the primary care-givers in a family, and this is one of the main unpaid work activities. We can use the OECD data to look at this in a bit more detail.

Overall gender differences

First, we can look at the overall time-management differences between the two genders.

In order to get an overview of the current differences between the 33 countries (30 OECD, 3 non-OECD), I have performed this blog's usual exploratory data analysis. The available data are multivariate, since there are five measured variables for each country — total paid work, total unpaid work, total personal care time, leisure time (each measured in average number of minutes per day), plus Other (to make a total of 1,440 minutes per day). One of the simplest ways to get a pictorial overview of the data patterns is to use a phylogenetic network, as a form of exploratory data analysis. For this network analysis, I first calculated the gender differences as Male time minus Female time (for each variable separately), and then calculated the similarity of the countries using the manhattan distance. A Neighbor-net analysis was then used to display the between-country similarities.

The resulting network is shown in the first graph. Countries that are closely connected in the network are similar to each other based on their average gender difference in time management, and those countries that are further apart are progressively more different from each other.

At the bottom of the network we see those countries with the biggest gender differences, progressing up to the top with those countries with the least difference.

So, the non-European countries show the most traditional separation of gender roles, with Portugal standing out as being the only one from Europe. China is not situated with the other two Asian countries (Japan, Korea), although why it should be similar to South Africa is not clear.

Indeed, the English-speaking part of the southern hemisphere does not do well, with all three countries (South Africa, Australia, New Zealand) showing stronger gender differences than any of the other English-speaking countries (Canada, USA, UK), except for the Irish (who thus have some explaining to do).

The Scandinavia countries are at the top (Sweden, Norway, Denmark), with the smallest gender differences, which will not surprise anyone who knows these people. On the other hand, the location of France may surprise those people who have a clichéd image of the behavior of Frenchmen. France is clearly separated from the more traditional societies of the other Mediterranean countries (Spain, Greece, Italy), appearing in the network with other northern countries (Belgium, Netherlands, Germany).

Finland and Estonia have strong historical ties, and they are distinct from the other Baltic countries (Latvia and Lithuania).

Work time differences

Having thus noted that there are some strong gender differences in time-management between countries, we can now proceed to look specifically at Paid versus Unpaid work.

First, we can simply take the total amount of reported Paid + Unpaid work, and compare gender differences across the various countries. This table lists the reported differences expressed as Male time minus Female time, in average minutes per day:
New Zealand
South Africa

These time differences between males and females become very large towards the bottom of the table, where in India it amounts to 1.5 hours per day, and is >1 hour for all of the bottom 8 countries. Note that only in the first four countries (out of the 33) does the total work time for males exceed that for females. It is unclear why the reported gender difference is so large for Norwegians; but maybe some of my readers might think that this could be a useful role model for the other countries!

We can now look at the balance between paid and unpaid work for the two genders. The following graph shows the difference as Male time minus Female time (in average minutes per day) for Paid work (horizontally) and Unpaid work (vertically). The pink line indicates the balance between the two types of work (ie. a decrease in paid work is balanced by a corresponding increase in unpaid work, and vice versa).

Gender differences in amount of paid versus unpaid work

The horizontal axis makes it clear that males always do more paid work than do females, on average, in every country, and up to 4 hours more in Mexico and Turkey. The vertical axis makes it clear that females always do more unpaid work than do males, on average, in every country, and up to 5 hours more in India.

These two variables must be correlated, since most people do either the one type of work or the other. However, in most countries the gender balance is not equal, as shown in the table above (females usually do more total work than do males). Some countries come close to a balance (indicated by the pink line), including the USA.

Note that the country with the closest gender equality is the one with the best reputation in this regard: Sweden. For example, Swedish couples frequently share their workplace parental leave for new-born children, so that there is very little gender bias in who is the primary care-giver in a family. However, the gender bias still amounts to 5–7 minutes of work per day, even in Sweden.

At the other end of the scale, there are a number of countries that still abide by the traditional model of gender roles, of which five are labeled at the bottom of the graph. These cover quite a diversity of cultures, so that no generalizations can be made. However, the gender bias in India exceeds that in Mexico — the Indians report less total work time than do the Mexicans, but that time is organized in a more gender-biased manner. Once again, Portugal stands out among the European countries — the Portuguese work longer hours than do other Europeans, and that time is organized in a more gender-biased manner.

Other differences

Gender differences occur among the other survey variables, as well. As one simple example, we can consider the time reported as being spent Eating & Drinking. This graph shows the time (in minutes per day) spent by the males (horizontally) and the females (vertically) for each of the 33 countries.

Gender differences in amount of time spent eating and drinking

As you can see, there is not a big difference between the two genders, in any country. However, in most countries males do report spending more time feeding themselves than do the females (ie. the points are to the right of the pink line, which represents equal time).

The Mediterranean countries spend the most time eating and drinking, with Greece showing the biggest gender difference. The fast food preferred by Canadians and Americans clearly does not take much time to consume, in any given day, and females can apparently eat it just as fast as males.


The conclusion surprises no-one — all countries have clear gender differences in who does most of the unpaid work. Two Scandinavian countries stand out — Norway, because males do more total work than do females; and Sweden, where the gender balance between paid and unpaid work is smallest. Some countries still show strong gender bias, including India, Mexico, Turkey and Portugal

Monday, January 13, 2020

Why we may want to map trait evolution on networks, pt. 2 – Topological ambiguity

In last week's Part 1, I gave an introduction to the problem of categorizing the polarity of morphological traits. How can we reconstruct which characters are primitive, or plesiomorphic according to Hennig, and which are derived, or apomorphic? This is something we need to do to reconstruct evolution, because most of the past is only preserved in the form of fossils, usually lacking any DNA. In this second part of the discussion, I'm going to take apart my own tree and show why we inevitably need networks, not trees.

There may be more than one tree

Even with more and more data at hand, some molecular phylogenies refuse to be unambiguous. Even worse, different, well-sampled molecular data sets may tell different stories — ie. there is more than one molecular tree to explain the diversity patterns. The ML tree used for the ML character mapping in Part 1 was pretty well supported, but not telling the entire truth.

For a start, there is no reason to assume that oaks are not monophyletic even though the data fail to resolve them as a clade (evolving something unique like the oaks twice would be a striking trick, even for gambling Mother Nature) — molecular trees may have misleading, sometimes just wrong, branches, even when they are highly supported.

In this case, one complication is that the oligogene dataset combines plastid and nuclear gene regions that not only differ in their information content but also infer different phylogenetic scenarios (and mask a lot of intra-generic and sub-generic incongruencies). This is illustrated in the following tanglegram.

Fig. 3 – A tanglegram, on the left the ML tree inferred from only the plastid gene regions (1406 DAP, alignment 15254 bp long), and on the right the corresponding nuclear data based tree (1691 DAP per only 4983 bp).

Even though the support along the backbone of the plastid tree is lowa (to non-existent), it well reflects the general diversification patterns in Fagaceae plastomes (see also the tree in Manos et al. 2008, Madroño 55:181–190; and Yan et al. 2019, BMC Evol. Biol. 19: 202, for an oak global picture). Plastid signatures show a strong geographic sorting (eg. New World vs. Old World), while the nuclear data provides most of the lineage-differentiating signal expressed in the combined tree (Part 1, Fig. 2).

Mapping along networks

How do we decide what is a real synapomorphy, a homoiology, or a good symplesiomorphy? Mapping the traits along all possible rooted trees is one option. Another option is to just map them along a consensus network of all trees, as shown next.

Fig. 4 – Map of the seven characters on the consensus network of the nuclear and plastid trees shown in Fig. 3. Blue – genus autapomorphies, dark green – synapomorphies/terminal homoiologies, light green – symplesiomorphies, orange – deep homoiologies, red – randomly distributed trait, pink – genus-restricted reversals.

According to the mapping, the newly described South American Castanopsis rothwellii, assigned to the modern (Souteast Asian) genus Castanopsis, is a stem Castanoideae / Fagaceae, while the "extinct" North American genus Castanopsoidea (then the "earliest megafossil evidence of Fagaceae": Crepet & Nixon 1989, Am. J. Bot. 76: 842–855) could be a stem / crown member of the Castanea-Castanopsis lineage. The difference to the ML trait mapping (Fig. 3 in Part 1) on the combined tree is that we get a better picture what is a lineage-specific trait set in Castanea-Castanopsis, because the interference of the monophyletic(!) oak grade is minimized.

Another possibility is to map the characters directly along a distance-based network, and then compare the latter with the molecular-based topological alternatives. This is quite puzzling in this case, because the morphology (Fig. 1 in Part 1) matches neither the nuclear tree nor the plastid tree (Figs. 2–4) — the traits scored for the fossils cover largely morphological Play-Doh of the Fagaceae.

Fig. 5 – Neighbor-nets based on mean morphological distances. Top graph – polymorphisms treated as ambiguities (standard approach), bottom graph – polymorphism treated as additional states (experimental approach). Text coloring as in Fig. 4, light blue – potential autapomorphy of the fossil American castaneoid lineage. Edge colors: green – edge representing a molecular clade/likley monophyletic group; orange – edge representing a paraphyletic group; red – edge rejected by molecular data; blue – edges supporting a distinct fossil American castaneoid lineage.

The likely primitive characters, irrespective of the evolutionary scenario we prefer, are those also found in the Eocene fossilsb. There are no derived traits/character suites pinning the fossils to Castanopsis. The fossils are a bit derived on their own terms (note their position in Fig. 5), and hence we can deduce that the fossils are either: (a) representing a relatively primitive extinct American sister lineage or (b) surviving, somewhat evolved members of the precursors of modern-day core Fagaceae. Note that the derived oaks evolved nearly 60 myrs ago, ie. 8 myrs before the oldest (Patagonian) Castanoideae fossil was deposited. The earliest (known) Fagaceae and castaneoid pollen are from 80+ Ma old Upper Cretaceous sediments in western North America (Grímsson et al. 2016, Acta Palaeobot. 56: 247–305; open access) and Japan (Takahashi et al. 2008, Intl. J. Plant Sci. 169:899–907), giving them plenty of time to migrate into North and then South America during the Paleocene-Eocene green house episode.

Fig. 6 – Earliest fossil record of Fagaceae and Castanoideae mapped on Scotese's Paleoglobes (© Scotese 2013, GoogleEarth layover files are available from here). Note that although there was no continuous land bridge, North and South America were already connected by a chain of large and high islands, providing a corridor for intercontinental dispersal of near- and extra-tropical plant lineages. A potential  crown-group Castanopsis (C. kaulii, cupule with associated seeds and pollen) has been recently recovered from the Baltic Amber (Sadowski et al. 2018 Am. J. Bot 105: 2025–2036).

Both of the mapping procedures described above are crude, in the sense that they ignore the molecular branch lengths, and use Ockham's Razor. But it strikes me as being not a bad start. They are better than just mapping along a single preferred molecular tree (as is done in many neontological papers; see Part 1) or along a morphology-based strict consensus cladogram (as is done in far too many paleontological papers; many palaeobotanical papers do neither the one nor the other: eg. Wilf et al., 2019, Science 364: eaaw5139). It's important to realize that if one taxon or subtree of our modern taxon set is characterized solely by the lack of shared derived traits or unstable expression of derived traits (like Castanopsis here, see position in both graphs in Fig. 5), ie. represents living fossils or little-evolved lineages, any ancient and primitive fossil, stem group, sister group or precursor, will be attracted by them in a total evidence or any other tree-based approach, especially when we rely on change-probability-naive parsimony as inference criterion. As we pointed out repeatedly: forming a clade in tree is neither a necessary nor a sufficient criterion for monophyly.

All gone, what to do when we have no molecular data?

Morphology alone, like genes on their own, will inevitably get some things wrong (compare Fig. 4 with Fig. 5). Without molecular data, one may have little reason to reject the monophyly of the Castaneoideae (when using more than the seven characters scored by Wilf et al. 2019; see eg. the cladogram in Crepet & Nixon 1989, fig. 1 based on an undocumented 25-character matrix). In the process, we would misinterpret overall similarity, due to shared primitive character suites and the lack of shared derived traits as evidence for an inclusive common originc.

What can we do if we have no or very few extant taxa, when we only have one set of data prone to circular reasoning? Then using networks is inevitable as well (see Fig. 5; and some examples provided in the reading list below). We need to explore in-depth the signal in our data matrix. Only extremely biased morphological matrices provide clear tree-like signals, comprehensive ones will have internal conflict and allow for inferring many, partly very different but more or less equally optimal trees.

Exploratory data analysis will not eliminate all possible errors — based only on the graph in Fig. 5, we would get the inter-generic phylogenetic relationships in Fagaceae partly wrong. However, this may lead to an informed decision as to which of the many equally probable evolutionary scenarios make more sense than others. It will help to reduce the alternatives, without eliminating those that are equally valid (which every tree does). If the time-coverage is good, exploring morphological differentiation over time can be an asset, too (see eg. Stacking neighbor-nets – a real-world example).


The matrices used, networks etc. can be accessed via figshare.

Selection of related posts on The Genealogical World of Phylogenetic Networks

Clades, Cladograms, Cladistics, and why networks are inevitableillustrates why paleontologists should also be less tree-naive (see example in footnote c).
Has homoiology be neglected in phylogenetics? — why we should try to assess the phylogenetic quality of our traits.
Let distinguish between Hennig and Cladisticsas said in the title, the post provides reasons why we should distinguish between Hennig's concepts and clades in phylogenetic trees.
Ockham's Razor applied, but not used: can we do DNA-scaffolding with seven characters? — the original post dealing with Wilf et al.'s (2019) "phylogenetic analysis", which obviously was not scrutinized during review.
Please stop use cladograms!No matter whether you think evolution is tree-like or not, cladograms should be a matter of the past.
Should we try to infer trees on tree-unlikely matrices? —  using well-known (among paleobotanists) examples, I show why networks reveal much more than any tree when we deal with fossils.
More non-treelike data forced into trees: a glimpse into the dinosaursthe same but for a thunder lizard matrix.
Trivial data, but not so trivial graphsan inference experiment using very simple artificial binary matrices.

a The main reason for the lack of branch support is that individuals of different genera growing in the same area can share plastid haplotypes, while individuals of the same genus / infra-generic lineage, even species, can be quite different. [Note that the standard 4x4 ML nucleotide model treats polymorphisms as such, not as missing data.] Plus, the different lineages show different levels of plastid diversity (highest in Quercus subgenus Cerris, but low in subgenus Quercus, the North American castanoids and Lithocarpus outside Borneo, Castanea-Castanopsis appear to be in-between the extremes), and there is a tendency to preferably mutate sequence patterns within a lineage that otherwise differentiate between lineages (for instance, inversions that distinguish two genera, can be found as intra-lineage variation in the third genus or one of the oak sections).

b The striking similarity between the newly found South American and long-known slightly older North American fossils is likely the reason for not discussing the latter in the original paper or including them in the "DNA-scaffold" analysis. As is obvious from the graphs, the slightly younger North American fossil could easily be a slightly more derived of the same lineage than the South American fossil (Planchard et al. 2016 Paleont. Electr. 19.3.51A give a revised age of ≥ 49 Ma for the plant-bearing strata), and thus would have been at odds with the narrative of the authors (see also comment by Denk et al. 2019, Science 10.1126/science.aaz2189).

c As done by Wilf et al. (see also the argumentation in Wilf et al.'s response, Science 10.1126/science.aaz2297, to Denk et al.'s 2019 comment). The combination of circular reasoning, systematic bias, and (parsimony) tree-naivity is well expressed in Wilf et al.'s own words:
Fourth, Denk et al. erroneously contend that Castanopsis rothwellii, a fossil with so many diagnostic characters preserved that it could only be assigned to Castanopsis if “found alive” today (1), has plesiomorphic features and cannot be placed confidently in the extant genus [see Figs. 1–5 in this two-part post]. ... Denk et al.’s phylogenetic conclusions from their emended tree and matrix are misleading, in that any morphological matrix includes characters that are relevant only for the taxa included in the analysis. ... Because the fossils are castaneoid in all features, we did not include all Fagaceae in our original analysis (1) and likewise did not include all characters relevant to non-castaneoid fagaceous taxa. ... By adding just three relevant characters to the Denk et al. scaffold to accommodate the genera they added (Table 1), the fossil Castanopsis rothwellii is placed only with Castanopsis in the single [ie. the strict consensus of two equally parsimonious trees] most parsimonious tree (Fig. 1).
One of the three added traits ("expanded stigma") is exclusively shared by all five Castaneoideae genera, the second ("nut generally rounded in cross section") shared by all but one Castaneoideae and Quercus, and thus are symplesiomorphies of core Fagaceae: shared primitive traits that can be expected in a precursor of several or all modern genera or their less evolved extinct sister lineages. Or positively selected homoiologies, ie. evolved multiple times within the core Fagaceae. The third ("asymmetrical cupule") is an unstable convergence / deep parallelism and a trait of little phylogenetic value, since expressed as intra-generic (intraspecific?) variation in two distantly related genera: the monotypic Formanodendron, a trigonobalanoid, and Castanopsis. These are two genera that share only a very distant (and exclusive fide Hennig) common origin (see Part 1) but inhabit overlapping climate envelopes and ecological niches in modern-day East Asia.

Despite adding three hand-picked characters (from a set of at least 25 at hand, Crepet & Nixon 1989) and accepting a phylogeny closer to the reality, the Castanopsis "clade" in the new "scaffold tree" including the Patagonian fossil remains unsupported by any exclusive or even shared and stable derived trait/set of traits (as in the original study, Wilf et al. refrain from establishing any sort of node or branch support, or test of alternative placements).

Moreover, it is safe to assume that when one adds the extinct genus Castanopsoidea to the scaffold (Wilf et al. deliberately chose not to do so), it would compete with Castanopsis rothwellii for the placement next to the modern-day Castanopsis. According to Crepet & Nixon 1989, fig. 1, one possible placement of Castanopsoidea is a sister to "Castanopsis (1)". This is not necessarily because they share a direct common origin but because these fossils also lack uniquely derived characters or a clearly derived character suite defining all Fagaceae genera except for Castanopsis (which in Crepet & Nixon's morpho-tree, is paraphyletic to Lithocarpus, which, back then, included the potential oak sister genus Notholithocarpus — literally: the 'false Lithocarpus'). Personally, for the same reasons as outlined and applied in Bomfleur et al. 2017, PeerJ 5: e3433 (and like Denk et al. 2019), I would have no problem calling all these fossils Castanopsis by defining the genus as explicitly paraphyletic, which could include the modern-day species of Castanopsis (which are probably monophyletic) and Castanopsis-like fossils that may be more or less related to them and/or other core Fagaceae: the precursors and extinct but similar, underived sister lineages.

Monday, January 6, 2020

Why we may want to map trait evolution on networks, pt. 1 – Introduction

One of the more interesting aspects of studying evolution is to trace the evolution of the traits possessed by the organisms, whether those traits are physical or not (such as languages). That is, we usually infer phylogenetic trees and networks to see how things evolve, including both the organisms and their characteristics. However, this can easily lead to circular reasoning, as I will discuss here.


A phylogenetic tree may be enough to work out who is sister to whom. However, when thinking about evolution itself, we actually want to find out who comes from whom, instead. This may be the reason why Charles Darwin did not title his 'abstract' A Natural Order of Species but used instead The Origin of Species.

The tricky bit is this: in order to find the origin, we first need to establish ancestor-descendant relationships, so that we can then see how things like fossils fit in (ie. whether they represent extinct sister lineages or precursors of the modern-day taxa). When taxon B is a derivation of A (ie. B evolved from A), the character suite of A is not only primitive but also the original set. Now, let's assume that we have a third fossil taxon C, which is clearly related to A and B. As evolutionary biologists, we cannot be content merely with inferring sister relationships between A, B, and C, but instead we need to decide whether or not C is also descended from A.

Ironically (from a modern cladistic viewpoint, focusing on establishing sister-clade relationships), Willy Hennig provided us with some tools for doing this:
  • We all know that apomorphies are derived traits, either unique (aut-) or uniquely shared by a group with inclusive common origin (syn-). Aut- and syn-apomorphies define (Hennig's) monophyletic groups. According to Hennig, a synapomorphy is a necessary criterion for recognizing a monophyletic group, and also a sufficient one (although the latter easily falls prey to circular reasoning). More importantly, they tell us that the ancestor(s) of the group and its (potentially lost) sister lineages lacked this trait!
  • Sym-plesiomorphies are traits that are primitive within a certain lineage. They define paraphyletic groups, which are groups of exclusive common origin. Following Hennig, they need to be discarded for systematics. From an evolutionary viewpoint, however, symplesiomorphies have double information content: (a) they provide us with traits that, at some point back in time, were synapomorphies; and (b) any member of a lineage not carrying the symplesiomorphic trait, shows a derived one.
Farris' cladistics is still the basis for systematics, and is widely applied in phylo-paleontology. The initial flaw of this approach is to assume that we can use morphological traits to infer a tree (with parsimony), and then map the same traits onto the inferred tree, allowing us to qualify the traits towards Hennig's objectives. How can this not be circular reasoning? We are mapping the traits onto a tree derived from those traits in the first place, so that the tree-building and mapping are not independent.

A simple (?) real-world example

For the purpose of this exercise, we will take the minute (seven character) matrix of Wilf et al. (2019) from this previous blog post (characters 5 and 7 corrected, and missing Fagaceae added; see also Denk et al. 2019, Science, 10.1126/science.aaz2189).

Wilf et al. found an Eocene fossil in South America, and argued that it must be a member of the modern genus Castanopsis, based on a parsimony DNA-scaffold approach (without actually using a DNA partition). Being a member of a modern genus, the fossil should have some aut-/synapormphies or at least symplesiomorphies or homoiologies characterizing its sublineage of the Fagaceae, the paraphyletic Castanoideae.

Based on the morphology, we can infer this tree:

Fig. 1 – Adams consensus tree of 3 most-parsimonious trees (11 steps, CI = 0.84, RI = 0.88), traits are mapped using Mesquite's default parsimony model. Castanopsis rothwellii is the Eocene fossil found by Wilf et al.

Two characters qualify as near-synapomorphies (effectively there is only one: hemispheric indehiscent cupules) that define a crown-clade including Lithocarpus, Notholithocarpus, Castanopsis (as part of intrageneric variation) and Quercus. Most other putatively derived traits within the Fagaceae subclades are symplesiomorphies; two are potential homoiologies, one defining the Castanea-Chrysolepis clade. [Note the staircase-like tree topology, a common feature of parsimony trees dealing with extinct lineages.] The fossil's character suite is relatively derived, characters 6 (shared only with some Castanopsis) and 7 (reversal as in Quercus) could be interpreted as an extinct side lineage of the (paraphyletic) Castanoideae.

This is not a bad analysis for seven characters, but it is likely to be quite wrong.

Fagaceae still exist today, and their DNA can be sampled. Below is a maximum-likelihood tree, based on a 2012 NCBI GenBank oligogene data harvest I did for a talk in Bordeaux — the alignment is 19,242 basepairs long, has 2,985 distinct alignment patterns and a gappyness of 35.8%. Each genus and major intra-generic lineage is represented by a strict consensus sequence based on all available data (checked for mislabeled or pseudogene accessions). [Oaks started to radiate > 50 Ma, Grímsson et al. 2015, Hipp et al. 2019; beeches about the same time, Denk et al. 2009, Renner et al. 2016.]

Fig. 2 – a ML tree based on strict genus/intrageneric consensus sequences (see also Oh & Manos, 2008, fig. 4, based only on data from the Crabs Claw gene, CRC; fig. 5 in the same paper shows a combined CRC + ITS tree)

According to this analysis, Chrysolepis and Castanea are not sisters; Castanea, but not Lithocarpus, is a close relative of the oaks. The (monophyletic) Trigonobalanoideae should form a clade (Fig. 2) not a grade (Fig. 1).

The analysis is not circular anymore, when we infer a tree based on data that is, as far as we know, independent of the data we want to map onto the tree. With the invention of stochastic mapping methods, we also avoid the possible limitations of parsimony when it comes to character mapping — morphological evolution is often not parsimonious, at least for the traits we can observe back in time or study in detail today.

Fig. 3 – ML trait mapping on the tree in Fig. 2 (ie. considering molecular branch lengths). Note, the reconstruction of character state for the all-ancestor are ambiguous due to the extreme genetic distance between Fagus and the remainder of the Fagaceae. The situation in the scored fossils (Wilf et al. 2019, Denk et al. 2019) are shown for comparison.

For the ML mapping above, I scored intra-generic variations as additional states (ML ancestral-state reconstruction as implemented in the Mesquite program needs defined tips) and applied Mesquite's default model — this is essentially Lewis' Mk model for multi-state standard characters: one substitution category for any possible mutation. We can now compare the two mappings.

What our morphology-based tree recognized as derived was actually partly primitive. The near-synapomorphy (hemispheric indehiscent cupules) is in fact a symplesiomorphy of all Castanoideae + Quercus. Traits shared by Castanea-Castanopsis (pro parte, ie. some species show the ancestral, others the derived state) and Quercus are primitive, while those unique to (or part of intra-generic variation) one or several Castanoideae are derived.

Note that the alleged crown-group but old fossil Castanopsis rothwellii would fit at the base of the (core Fagaceae) tree (zero conflict) as well as close to its leaves (at least one conflicting character). Six of the seven traits can be pinpointed for the core Fagaceae ancestor. According to the reconstruction, it had three styles, scaly cupule appendages, hemispheric indehiscent cupules (vs. valvate in C. rothwellii), one flower per cupules, no valve dehiscence ("partial" in C. rothwellii), and inflorescences were unisexual and mixed (Wilf et al. state the Eocene fossils were unisexual, although the difference can only be assessed when investigating all inflorescences on a tree, see Denk et al.'s comment). The reconstruction is ambiguous regarding whether female flowers were clustered or solitary.

However, there is one implicit assumption held in common by all of the methods, including DNA-scaffolding, probabilistic and stochastic character mapping, total evidence dating, evolutionary placement algorithm (EPA) as implemented in the RAxML program, etc. That is: the inferred molecular tree is the true tree. This is the second fundamental flaw of cladistic approaches to evolution, as I will show in Part 2.

Data information

The morphological data used here is based on an emeneded version of the Wilf et al. matrix provided by my former colleague and co-author Thomas Denk (see also Denk et al. 2019, table 1); and it can be, together with the molecular data matrix used here, accessed via figshare.


Denk T, Grimm GW. (2009) The biogeographic history of beech trees. Review of Palaeobotany and Palynology 158: 83–100.

Denk T, Hill RS, Simeone MC, Cannon C, Dettmann ME, Manos PS. (2019) Comment on “Eocene Fagaceae from Patagonia and Gondwanan legacy in Asian rainforests”. Science 366: eaaz2189.

Grímsson F, Zetter R, Grimm GW, Krarup Pedersen G, Pedersen AK, Denk T. (2015) Fagaceae pollen from the early Cenozoic of West Greenland: revisiting Engler's and Chaney's Arcto-Tertiary hypotheses. Plant Systematics and Evolution 301: 809–832.

Hipp AL, Manos PS, Hahn M, et al. (2019) Genomic landscape of the global oak phylogeny. New Phytologist doi:10.1111/nph.16162.

Oh S-H, Manos PS. (2008) Molecular phylogenetics and cupule evolution in Fagaceae as inferred from nuclear CRABS CLAW sequences. Taxon 57: 434–451.

Renner SS, Grimm GW, Kapli P, Denk T. (2016) Species relationships and divergence times in beeches: New insights from the inclusion of 53 young and old fossils in a birth-death clock model. Philosophical Transactions of the Royal Society B doi:10.1098/rstb.2015.0135.

Wilf P, Nixon KC, Gandolfo MA, Cúneo NR. (2019) Eocene Fagaceae from Patagonia and Gondwanan legacy in Asian rainforests. Science 364:  eaaw5139.

For more literature, see the post:
Ockham's Razor applied but not used: can we make a DNA-scaffolding with seven characters?