Monday, January 20, 2020

Worldwide gender differences in amount of paid versus unpaid work


A few weeks ago, I wrote about National differences in the amount of paid and unpaid work. This involved a look at the time that people spend per day on each of various different activities, averaged across each year. The data came from the time-use surveys conducted by the Organisation for Economic Co-operation and Development (OECD) for its 30 member countries. I concluded that there are many similarities among countries that share strong cultural ties, although some countries stand out as unusual within this context.


Four main categories of time use are reported in the surveys: Paid Work or Study, Unpaid Work, Personal Care, and Leisure Time; these are described in more detail in my previous post. The aggregated results for each country are available online, including data for three non-OECD countries, for comparison (China, India, South Africa).

Of particular interest is that these data are actually aggregated separately for males and females (see Balancing paid work, unpaid work and leisure). This allows us to look at the various national time-management behaviors in the light of potential differences in gender roles within those countries.

Obviously, we expect some consistent gender differences, not least because in most cultures it is the females who have traditionally been the primary care-givers in a family, and this is one of the main unpaid work activities. We can use the OECD data to look at this in a bit more detail.

Overall gender differences

First, we can look at the overall time-management differences between the two genders.

In order to get an overview of the current differences between the 33 countries (30 OECD, 3 non-OECD), I have performed this blog's usual exploratory data analysis. The available data are multivariate, since there are five measured variables for each country — total paid work, total unpaid work, total personal care time, leisure time (each measured in average number of minutes per day), plus Other (to make a total of 1,440 minutes per day). One of the simplest ways to get a pictorial overview of the data patterns is to use a phylogenetic network, as a form of exploratory data analysis. For this network analysis, I first calculated the gender differences as Male time minus Female time (for each variable separately), and then calculated the similarity of the countries using the manhattan distance. A Neighbor-net analysis was then used to display the between-country similarities.

The resulting network is shown in the first graph. Countries that are closely connected in the network are similar to each other based on their average gender difference in time management, and those countries that are further apart are progressively more different from each other.


At the bottom of the network we see those countries with the biggest gender differences, progressing up to the top with those countries with the least difference.

So, the non-European countries show the most traditional separation of gender roles, with Portugal standing out as being the only one from Europe. China is not situated with the other two Asian countries (Japan, Korea), although why it should be similar to South Africa is not clear.

Indeed, the English-speaking part of the southern hemisphere does not do well, with all three countries (South Africa, Australia, New Zealand) showing stronger gender differences than any of the other English-speaking countries (Canada, USA, UK), except for the Irish (who thus have some explaining to do).

The Scandinavia countries are at the top (Sweden, Norway, Denmark), with the smallest gender differences, which will not surprise anyone who knows these people. On the other hand, the location of France may surprise those people who have a clichéd image of the behavior of Frenchmen. France is clearly separated from the more traditional societies of the other Mediterranean countries (Spain, Greece, Italy), appearing in the network with other northern countries (Belgium, Netherlands, Germany).

Finland and Estonia have strong historical ties, and they are distinct from the other Baltic countries (Latvia and Lithuania).

Work time differences

Having thus noted that there are some strong gender differences in time-management between countries, we can now proceed to look specifically at Paid versus Unpaid work.

First, we can simply take the total amount of reported Paid + Unpaid work, and compare gender differences across the various countries. This table lists the reported differences expressed as Male time minus Female time, in average minutes per day:
Norway
New Zealand
Denmark
Netherlands
Japan
Canada
USA
Australia
Germany
Mexico
Sweden
Turkey
UK
Korea
Austria
Belgium
Ireland
Poland
Luxembourg
France
Latvia
Finland
China
South Africa
Slovenia
Hungary
Lithuania
Estonia
Spain
Greece
Italy
Portugal
India
18.5
10.0
8.8
4.0
-3.3
-3.3
-4.7
-7.3
-7.8
-10.8
-11.3
-13.2
-16.1
-16.4
-17.9
-18.7
-20.1
-24.6
-27.4
-29.3
-35.1
-39.6
-44.0
-47.5
-54.1
-61.3
-65.3
-69.8
-73.7
-74.7
-87.9
-90.8
-94.3

These time differences between males and females become very large towards the bottom of the table, where in India it amounts to 1.5 hours per day, and is >1 hour for all of the bottom 8 countries. Note that only in the first four countries (out of the 33) does the total work time for males exceed that for females. It is unclear why the reported gender difference is so large for Norwegians; but maybe some of my readers might think that this could be a useful role model for the other countries!

We can now look at the balance between paid and unpaid work for the two genders. The following graph shows the difference as Male time minus Female time (in average minutes per day) for Paid work (horizontally) and Unpaid work (vertically). The pink line indicates the balance between the two types of work (ie. a decrease in paid work is balanced by a corresponding increase in unpaid work, and vice versa).

Gender differences in amount of paid versus unpaid work

The horizontal axis makes it clear that males always do more paid work than do females, on average, in every country, and up to 4 hours more in Mexico and Turkey. The vertical axis makes it clear that females always do more unpaid work than do males, on average, in every country, and up to 5 hours more in India.

These two variables must be correlated, since most people do either the one type of work or the other. However, in most countries the gender balance is not equal, as shown in the table above (females usually do more total work than do males). Some countries come close to a balance (indicated by the pink line), including the USA.

Note that the country with the closest gender equality is the one with the best reputation in this regard: Sweden. For example, Swedish couples frequently share their workplace parental leave for new-born children, so that there is very little gender bias in who is the primary care-giver in a family. However, the gender bias still amounts to 5–7 minutes of work per day, even in Sweden.

At the other end of the scale, there are a number of countries that still abide by the traditional model of gender roles, of which five are labeled at the bottom of the graph. These cover quite a diversity of cultures, so that no generalizations can be made. However, the gender bias in India exceeds that in Mexico — the Indians report less total work time than do the Mexicans, but that time is organized in a more gender-biased manner. Once again, Portugal stands out among the European countries — the Portuguese work longer hours than do other Europeans, and that time is organized in a more gender-biased manner.

Other differences

Gender differences occur among the other survey variables, as well. As one simple example, we can consider the time reported as being spent Eating & Drinking. This graph shows the time (in minutes per day) spent by the males (horizontally) and the females (vertically) for each of the 33 countries.

Gender differences in amount of time spent eating and drinking

As you can see, there is not a big difference between the two genders, in any country. However, in most countries males do report spending more time feeding themselves than do the females (ie. the points are to the right of the pink line, which represents equal time).

The Mediterranean countries spend the most time eating and drinking, with Greece showing the biggest gender difference. The fast food preferred by Canadians and Americans clearly does not take much time to consume, in any given day, and females can apparently eat it just as fast as males.

Conclusion

The conclusion surprises no-one — all countries have clear gender differences in who does most of the unpaid work. Two Scandinavian countries stand out — Norway, because males do more total work than do females; and Sweden, where the gender balance between paid and unpaid work is smallest. Some countries still show strong gender bias, including India, Mexico, Turkey and Portugal

Monday, January 13, 2020

Why we may want to map trait evolution on networks, pt. 2 – Topological ambiguity


In last week's Part 1, I gave an introduction to the problem of categorizing the polarity of morphological traits. How can we reconstruct which characters are primitive, or plesiomorphic according to Hennig, and which are derived, or apomorphic? This is something we need to do to reconstruct evolution, because most of the past is only preserved in the form of fossils, usually lacking any DNA. In this second part of the discussion, I'm going to take apart my own tree and show why we inevitably need networks, not trees.

There may be more than one tree

Even with more and more data at hand, some molecular phylogenies refuse to be unambiguous. Even worse, different, well-sampled molecular data sets may tell different stories — ie. there is more than one molecular tree to explain the diversity patterns. The ML tree used for the ML character mapping in Part 1 was pretty well supported, but not telling the entire truth.

For a start, there is no reason to assume that oaks are not monophyletic even though the data fail to resolve them as a clade (evolving something unique like the oaks twice would be a striking trick, even for gambling Mother Nature) — molecular trees may have misleading, sometimes just wrong, branches, even when they are highly supported.

In this case, one complication is that the oligogene dataset combines plastid and nuclear gene regions that not only differ in their information content but also infer different phylogenetic scenarios (and mask a lot of intra-generic and sub-generic incongruencies). This is illustrated in the following tanglegram.

Fig. 3 – A tanglegram, on the left the ML tree inferred from only the plastid gene regions (1406 DAP, alignment 15254 bp long), and on the right the corresponding nuclear data based tree (1691 DAP per only 4983 bp).

Even though the support along the backbone of the plastid tree is lowa (to non-existent), it well reflects the general diversification patterns in Fagaceae plastomes (see also the tree in Manos et al. 2008, Madroño 55:181–190; and Yan et al. 2019, BMC Evol. Biol. 19: 202, for an oak global picture). Plastid signatures show a strong geographic sorting (eg. New World vs. Old World), while the nuclear data provides most of the lineage-differentiating signal expressed in the combined tree (Part 1, Fig. 2).

Mapping along networks

How do we decide what is a real synapomorphy, a homoiology, or a good symplesiomorphy? Mapping the traits along all possible rooted trees is one option. Another option is to just map them along a consensus network of all trees, as shown next.

Fig. 4 – Map of the seven characters on the consensus network of the nuclear and plastid trees shown in Fig. 3. Blue – genus autapomorphies, dark green – synapomorphies/terminal homoiologies, light green – symplesiomorphies, orange – deep homoiologies, red – randomly distributed trait, pink – genus-restricted reversals.

According to the mapping, the newly described South American Castanopsis rothwellii, assigned to the modern (Souteast Asian) genus Castanopsis, is a stem Castanoideae / Fagaceae, while the "extinct" North American genus Castanopsoidea (then the "earliest megafossil evidence of Fagaceae": Crepet & Nixon 1989, Am. J. Bot. 76: 842–855) could be a stem / crown member of the Castanea-Castanopsis lineage. The difference to the ML trait mapping (Fig. 3 in Part 1) on the combined tree is that we get a better picture what is a lineage-specific trait set in Castanea-Castanopsis, because the interference of the monophyletic(!) oak grade is minimized.

Another possibility is to map the characters directly along a distance-based network, and then compare the latter with the molecular-based topological alternatives. This is quite puzzling in this case, because the morphology (Fig. 1 in Part 1) matches neither the nuclear tree nor the plastid tree (Figs. 2–4) — the traits scored for the fossils cover largely morphological Play-Doh of the Fagaceae.

Fig. 5 – Neighbor-nets based on mean morphological distances. Top graph – polymorphisms treated as ambiguities (standard approach), bottom graph – polymorphism treated as additional states (experimental approach). Text coloring as in Fig. 4, light blue – potential autapomorphy of the fossil American castaneoid lineage. Edge colors: green – edge representing a molecular clade/likley monophyletic group; orange – edge representing a paraphyletic group; red – edge rejected by molecular data; blue – edges supporting a distinct fossil American castaneoid lineage.

The likely primitive characters, irrespective of the evolutionary scenario we prefer, are those also found in the Eocene fossilsb. There are no derived traits/character suites pinning the fossils to Castanopsis. The fossils are a bit derived on their own terms (note their position in Fig. 5), and hence we can deduce that the fossils are either: (a) representing a relatively primitive extinct American sister lineage or (b) surviving, somewhat evolved members of the precursors of modern-day core Fagaceae. Note that the derived oaks evolved nearly 60 myrs ago, ie. 8 myrs before the oldest (Patagonian) Castanoideae fossil was deposited. The earliest (known) Fagaceae and castaneoid pollen are from 80+ Ma old Upper Cretaceous sediments in western North America (Grímsson et al. 2016, Acta Palaeobot. 56: 247–305; open access) and Japan (Takahashi et al. 2008, Intl. J. Plant Sci. 169:899–907), giving them plenty of time to migrate into North and then South America during the Paleocene-Eocene green house episode.

Fig. 6 – Earliest fossil record of Fagaceae and Castanoideae mapped on Scotese's Paleoglobes (© Scotese 2013, GoogleEarth layover files are available from here). Note that although there was no continuous land bridge, North and South America were already connected by a chain of large and high islands, providing a corridor for intercontinental dispersal of near- and extra-tropical plant lineages. A potential  crown-group Castanopsis (C. kaulii, cupule with associated seeds and pollen) has been recently recovered from the Baltic Amber (Sadowski et al. 2018 Am. J. Bot 105: 2025–2036).

Both of the mapping procedures described above are crude, in the sense that they ignore the molecular branch lengths, and use Ockham's Razor. But it strikes me as being not a bad start. They are better than just mapping along a single preferred molecular tree (as is done in many neontological papers; see Part 1) or along a morphology-based strict consensus cladogram (as is done in far too many paleontological papers; many palaeobotanical papers do neither the one nor the other: eg. Wilf et al., 2019, Science 364: eaaw5139). It's important to realize that if one taxon or subtree of our modern taxon set is characterized solely by the lack of shared derived traits or unstable expression of derived traits (like Castanopsis here, see position in both graphs in Fig. 5), ie. represents living fossils or little-evolved lineages, any ancient and primitive fossil, stem group, sister group or precursor, will be attracted by them in a total evidence or any other tree-based approach, especially when we rely on change-probability-naive parsimony as inference criterion. As we pointed out repeatedly: forming a clade in tree is neither a necessary nor a sufficient criterion for monophyly.

All gone, what to do when we have no molecular data?

Morphology alone, like genes on their own, will inevitably get some things wrong (compare Fig. 4 with Fig. 5). Without molecular data, one may have little reason to reject the monophyly of the Castaneoideae (when using more than the seven characters scored by Wilf et al. 2019; see eg. the cladogram in Crepet & Nixon 1989, fig. 1 based on an undocumented 25-character matrix). In the process, we would misinterpret overall similarity, due to shared primitive character suites and the lack of shared derived traits as evidence for an inclusive common originc.

What can we do if we have no or very few extant taxa, when we only have one set of data prone to circular reasoning? Then using networks is inevitable as well (see Fig. 5; and some examples provided in the reading list below). We need to explore in-depth the signal in our data matrix. Only extremely biased morphological matrices provide clear tree-like signals, comprehensive ones will have internal conflict and allow for inferring many, partly very different but more or less equally optimal trees.

Exploratory data analysis will not eliminate all possible errors — based only on the graph in Fig. 5, we would get the inter-generic phylogenetic relationships in Fagaceae partly wrong. However, this may lead to an informed decision as to which of the many equally probable evolutionary scenarios make more sense than others. It will help to reduce the alternatives, without eliminating those that are equally valid (which every tree does). If the time-coverage is good, exploring morphological differentiation over time can be an asset, too (see eg. Stacking neighbor-nets – a real-world example).

Data

The matrices used, networks etc. can be accessed via figshare.

Selection of related posts on The Genealogical World of Phylogenetic Networks

Clades, Cladograms, Cladistics, and why networks are inevitableillustrates why paleontologists should also be less tree-naive (see example in footnote c).
Has homoiology be neglected in phylogenetics? — why we should try to assess the phylogenetic quality of our traits.
Let distinguish between Hennig and Cladisticsas said in the title, the post provides reasons why we should distinguish between Hennig's concepts and clades in phylogenetic trees.
Ockham's Razor applied, but not used: can we do DNA-scaffolding with seven characters? — the original post dealing with Wilf et al.'s (2019) "phylogenetic analysis", which obviously was not scrutinized during review.
Please stop use cladograms!No matter whether you think evolution is tree-like or not, cladograms should be a matter of the past.
Should we try to infer trees on tree-unlikely matrices? —  using well-known (among paleobotanists) examples, I show why networks reveal much more than any tree when we deal with fossils.
More non-treelike data forced into trees: a glimpse into the dinosaursthe same but for a thunder lizard matrix.
Trivial data, but not so trivial graphsan inference experiment using very simple artificial binary matrices.



a The main reason for the lack of branch support is that individuals of different genera growing in the same area can share plastid haplotypes, while individuals of the same genus / infra-generic lineage, even species, can be quite different. [Note that the standard 4x4 ML nucleotide model treats polymorphisms as such, not as missing data.] Plus, the different lineages show different levels of plastid diversity (highest in Quercus subgenus Cerris, but low in subgenus Quercus, the North American castanoids and Lithocarpus outside Borneo, Castanea-Castanopsis appear to be in-between the extremes), and there is a tendency to preferably mutate sequence patterns within a lineage that otherwise differentiate between lineages (for instance, inversions that distinguish two genera, can be found as intra-lineage variation in the third genus or one of the oak sections).

b The striking similarity between the newly found South American and long-known slightly older North American fossils is likely the reason for not discussing the latter in the original paper or including them in the "DNA-scaffold" analysis. As is obvious from the graphs, the slightly younger North American fossil could easily be a slightly more derived of the same lineage than the South American fossil (Planchard et al. 2016 Paleont. Electr. 19.3.51A give a revised age of ≥ 49 Ma for the plant-bearing strata), and thus would have been at odds with the narrative of the authors (see also comment by Denk et al. 2019, Science 10.1126/science.aaz2189).

c As done by Wilf et al. (see also the argumentation in Wilf et al.'s response, Science 10.1126/science.aaz2297, to Denk et al.'s 2019 comment). The combination of circular reasoning, systematic bias, and (parsimony) tree-naivity is well expressed in Wilf et al.'s own words:
Fourth, Denk et al. erroneously contend that Castanopsis rothwellii, a fossil with so many diagnostic characters preserved that it could only be assigned to Castanopsis if “found alive” today (1), has plesiomorphic features and cannot be placed confidently in the extant genus [see Figs. 1–5 in this two-part post]. ... Denk et al.’s phylogenetic conclusions from their emended tree and matrix are misleading, in that any morphological matrix includes characters that are relevant only for the taxa included in the analysis. ... Because the fossils are castaneoid in all features, we did not include all Fagaceae in our original analysis (1) and likewise did not include all characters relevant to non-castaneoid fagaceous taxa. ... By adding just three relevant characters to the Denk et al. scaffold to accommodate the genera they added (Table 1), the fossil Castanopsis rothwellii is placed only with Castanopsis in the single [ie. the strict consensus of two equally parsimonious trees] most parsimonious tree (Fig. 1).
One of the three added traits ("expanded stigma") is exclusively shared by all five Castaneoideae genera, the second ("nut generally rounded in cross section") shared by all but one Castaneoideae and Quercus, and thus are symplesiomorphies of core Fagaceae: shared primitive traits that can be expected in a precursor of several or all modern genera or their less evolved extinct sister lineages. Or positively selected homoiologies, ie. evolved multiple times within the core Fagaceae. The third ("asymmetrical cupule") is an unstable convergence / deep parallelism and a trait of little phylogenetic value, since expressed as intra-generic (intraspecific?) variation in two distantly related genera: the monotypic Formanodendron, a trigonobalanoid, and Castanopsis. These are two genera that share only a very distant (and exclusive fide Hennig) common origin (see Part 1) but inhabit overlapping climate envelopes and ecological niches in modern-day East Asia.

Despite adding three hand-picked characters (from a set of at least 25 at hand, Crepet & Nixon 1989) and accepting a phylogeny closer to the reality, the Castanopsis "clade" in the new "scaffold tree" including the Patagonian fossil remains unsupported by any exclusive or even shared and stable derived trait/set of traits (as in the original study, Wilf et al. refrain from establishing any sort of node or branch support, or test of alternative placements).

Moreover, it is safe to assume that when one adds the extinct genus Castanopsoidea to the scaffold (Wilf et al. deliberately chose not to do so), it would compete with Castanopsis rothwellii for the placement next to the modern-day Castanopsis. According to Crepet & Nixon 1989, fig. 1, one possible placement of Castanopsoidea is a sister to "Castanopsis (1)". This is not necessarily because they share a direct common origin but because these fossils also lack uniquely derived characters or a clearly derived character suite defining all Fagaceae genera except for Castanopsis (which in Crepet & Nixon's morpho-tree, is paraphyletic to Lithocarpus, which, back then, included the potential oak sister genus Notholithocarpus — literally: the 'false Lithocarpus'). Personally, for the same reasons as outlined and applied in Bomfleur et al. 2017, PeerJ 5: e3433 (and like Denk et al. 2019), I would have no problem calling all these fossils Castanopsis by defining the genus as explicitly paraphyletic, which could include the modern-day species of Castanopsis (which are probably monophyletic) and Castanopsis-like fossils that may be more or less related to them and/or other core Fagaceae: the precursors and extinct but similar, underived sister lineages.

Monday, January 6, 2020

Why we may want to map trait evolution on networks, pt. 1 – Introduction


One of the more interesting aspects of studying evolution is to trace the evolution of the traits possessed by the organisms, whether those traits are physical or not (such as languages). That is, we usually infer phylogenetic trees and networks to see how things evolve, including both the organisms and their characteristics. However, this can easily lead to circular reasoning, as I will discuss here.

Background

A phylogenetic tree may be enough to work out who is sister to whom. However, when thinking about evolution itself, we actually want to find out who comes from whom, instead. This may be the reason why Charles Darwin did not title his 'abstract' A Natural Order of Species but used instead The Origin of Species.

The tricky bit is this: in order to find the origin, we first need to establish ancestor-descendant relationships, so that we can then see how things like fossils fit in (ie. whether they represent extinct sister lineages or precursors of the modern-day taxa). When taxon B is a derivation of A (ie. B evolved from A), the character suite of A is not only primitive but also the original set. Now, let's assume that we have a third fossil taxon C, which is clearly related to A and B. As evolutionary biologists, we cannot be content merely with inferring sister relationships between A, B, and C, but instead we need to decide whether or not C is also descended from A.

Ironically (from a modern cladistic viewpoint, focusing on establishing sister-clade relationships), Willy Hennig provided us with some tools for doing this:
  • We all know that apomorphies are derived traits, either unique (aut-) or uniquely shared by a group with inclusive common origin (syn-). Aut- and syn-apomorphies define (Hennig's) monophyletic groups. According to Hennig, a synapomorphy is a necessary criterion for recognizing a monophyletic group, and also a sufficient one (although the latter easily falls prey to circular reasoning). More importantly, they tell us that the ancestor(s) of the group and its (potentially lost) sister lineages lacked this trait!
  • Sym-plesiomorphies are traits that are primitive within a certain lineage. They define paraphyletic groups, which are groups of exclusive common origin. Following Hennig, they need to be discarded for systematics. From an evolutionary viewpoint, however, symplesiomorphies have double information content: (a) they provide us with traits that, at some point back in time, were synapomorphies; and (b) any member of a lineage not carrying the symplesiomorphic trait, shows a derived one.
Farris' cladistics is still the basis for systematics, and is widely applied in phylo-paleontology. The initial flaw of this approach is to assume that we can use morphological traits to infer a tree (with parsimony), and then map the same traits onto the inferred tree, allowing us to qualify the traits towards Hennig's objectives. How can this not be circular reasoning? We are mapping the traits onto a tree derived from those traits in the first place, so that the tree-building and mapping are not independent.

A simple (?) real-world example

For the purpose of this exercise, we will take the minute (seven character) matrix of Wilf et al. (2019) from this previous blog post (characters 5 and 7 corrected, and missing Fagaceae added; see also Denk et al. 2019, Science, 10.1126/science.aaz2189).

Wilf et al. found an Eocene fossil in South America, and argued that it must be a member of the modern genus Castanopsis, based on a parsimony DNA-scaffold approach (without actually using a DNA partition). Being a member of a modern genus, the fossil should have some aut-/synapormphies or at least symplesiomorphies or homoiologies characterizing its sublineage of the Fagaceae, the paraphyletic Castanoideae.

Based on the morphology, we can infer this tree:

Fig. 1 – Adams consensus tree of 3 most-parsimonious trees (11 steps, CI = 0.84, RI = 0.88), traits are mapped using Mesquite's default parsimony model. Castanopsis rothwellii is the Eocene fossil found by Wilf et al.

Two characters qualify as near-synapomorphies (effectively there is only one: hemispheric indehiscent cupules) that define a crown-clade including Lithocarpus, Notholithocarpus, Castanopsis (as part of intrageneric variation) and Quercus. Most other putatively derived traits within the Fagaceae subclades are symplesiomorphies; two are potential homoiologies, one defining the Castanea-Chrysolepis clade. [Note the staircase-like tree topology, a common feature of parsimony trees dealing with extinct lineages.] The fossil's character suite is relatively derived, characters 6 (shared only with some Castanopsis) and 7 (reversal as in Quercus) could be interpreted as an extinct side lineage of the (paraphyletic) Castanoideae.

This is not a bad analysis for seven characters, but it is likely to be quite wrong.

Fagaceae still exist today, and their DNA can be sampled. Below is a maximum-likelihood tree, based on a 2012 NCBI GenBank oligogene data harvest I did for a talk in Bordeaux — the alignment is 19,242 basepairs long, has 2,985 distinct alignment patterns and a gappyness of 35.8%. Each genus and major intra-generic lineage is represented by a strict consensus sequence based on all available data (checked for mislabeled or pseudogene accessions). [Oaks started to radiate > 50 Ma, Grímsson et al. 2015, Hipp et al. 2019; beeches about the same time, Denk et al. 2009, Renner et al. 2016.]

Fig. 2 – a ML tree based on strict genus/intrageneric consensus sequences (see also Oh & Manos, 2008, fig. 4, based only on data from the Crabs Claw gene, CRC; fig. 5 in the same paper shows a combined CRC + ITS tree)

According to this analysis, Chrysolepis and Castanea are not sisters; Castanea, but not Lithocarpus, is a close relative of the oaks. The (monophyletic) Trigonobalanoideae should form a clade (Fig. 2) not a grade (Fig. 1).

The analysis is not circular anymore, when we infer a tree based on data that is, as far as we know, independent of the data we want to map onto the tree. With the invention of stochastic mapping methods, we also avoid the possible limitations of parsimony when it comes to character mapping — morphological evolution is often not parsimonious, at least for the traits we can observe back in time or study in detail today.

Fig. 3 – ML trait mapping on the tree in Fig. 2 (ie. considering molecular branch lengths). Note, the reconstruction of character state for the all-ancestor are ambiguous due to the extreme genetic distance between Fagus and the remainder of the Fagaceae. The situation in the scored fossils (Wilf et al. 2019, Denk et al. 2019) are shown for comparison.

For the ML mapping above, I scored intra-generic variations as additional states (ML ancestral-state reconstruction as implemented in the Mesquite program needs defined tips) and applied Mesquite's default model — this is essentially Lewis' Mk model for multi-state standard characters: one substitution category for any possible mutation. We can now compare the two mappings.

What our morphology-based tree recognized as derived was actually partly primitive. The near-synapomorphy (hemispheric indehiscent cupules) is in fact a symplesiomorphy of all Castanoideae + Quercus. Traits shared by Castanea-Castanopsis (pro parte, ie. some species show the ancestral, others the derived state) and Quercus are primitive, while those unique to (or part of intra-generic variation) one or several Castanoideae are derived.

Note that the alleged crown-group but old fossil Castanopsis rothwellii would fit at the base of the (core Fagaceae) tree (zero conflict) as well as close to its leaves (at least one conflicting character). Six of the seven traits can be pinpointed for the core Fagaceae ancestor. According to the reconstruction, it had three styles, scaly cupule appendages, hemispheric indehiscent cupules (vs. valvate in C. rothwellii), one flower per cupules, no valve dehiscence ("partial" in C. rothwellii), and inflorescences were unisexual and mixed (Wilf et al. state the Eocene fossils were unisexual, although the difference can only be assessed when investigating all inflorescences on a tree, see Denk et al.'s comment). The reconstruction is ambiguous regarding whether female flowers were clustered or solitary.

However, there is one implicit assumption held in common by all of the methods, including DNA-scaffolding, probabilistic and stochastic character mapping, total evidence dating, evolutionary placement algorithm (EPA) as implemented in the RAxML program, etc. That is: the inferred molecular tree is the true tree. This is the second fundamental flaw of cladistic approaches to evolution, as I will show in Part 2.

Data information

The morphological data used here is based on an emeneded version of the Wilf et al. matrix provided by my former colleague and co-author Thomas Denk (see also Denk et al. 2019, table 1); and it can be, together with the molecular data matrix used here, accessed via figshare.

References

Denk T, Grimm GW. (2009) The biogeographic history of beech trees. Review of Palaeobotany and Palynology 158: 83–100.

Denk T, Hill RS, Simeone MC, Cannon C, Dettmann ME, Manos PS. (2019) Comment on “Eocene Fagaceae from Patagonia and Gondwanan legacy in Asian rainforests”. Science 366: eaaz2189.

Grímsson F, Zetter R, Grimm GW, Krarup Pedersen G, Pedersen AK, Denk T. (2015) Fagaceae pollen from the early Cenozoic of West Greenland: revisiting Engler's and Chaney's Arcto-Tertiary hypotheses. Plant Systematics and Evolution 301: 809–832.

Hipp AL, Manos PS, Hahn M, et al. (2019) Genomic landscape of the global oak phylogeny. New Phytologist doi:10.1111/nph.16162.

Oh S-H, Manos PS. (2008) Molecular phylogenetics and cupule evolution in Fagaceae as inferred from nuclear CRABS CLAW sequences. Taxon 57: 434–451.

Renner SS, Grimm GW, Kapli P, Denk T. (2016) Species relationships and divergence times in beeches: New insights from the inclusion of 53 young and old fossils in a birth-death clock model. Philosophical Transactions of the Royal Society B doi:10.1098/rstb.2015.0135.

Wilf P, Nixon KC, Gandolfo MA, Cúneo NR. (2019) Eocene Fagaceae from Patagonia and Gondwanan legacy in Asian rainforests. Science 364:  eaaw5139.

For more literature, see the post:
Ockham's Razor applied but not used: can we make a DNA-scaffolding with seven characters?