Thursday, February 27, 2014

Roots and the phylogenetics of mythology

A few weeks ago I discussed the phylogenetic analysis of the tale of Little Red Riding Hood (The phylogenetics of Little Red Riding Hood). In that case, I pointed out that historical reconstructions require a rooted tree, and I discussed various possible methods for rooting the unrooted trees produced by the data analyses.

This is not the only time that phylogenetics has been applied to myths or tales. For example, d'Huy (2013a) has studied the prehistoric Polyphemus tale belonging to the European and North Amerindian areas, and d'Huy (2013b) has studied the mythological motif of the Cosmic Hunt linked to the Big Dipper constellation (typical for northern and central Eurasia and for the Americas but unknown on other continents). In the first case a binary matrix of 98 characteristics for 44 versions of the tale was used, and in the latter 93 characteristics for 47 versions. Both of these studies have rooted trees.

In the latter case, a novel method of rooting the tree was used. The unrooted tree was successively rooted with each of the likely versions of the tale as outgroup. In each case the ancestral tale (the protomyth) was reconstructed and the ancestral states of the tale's characteristics (called mythemes) were determined. The author then "selected the version that holds the majority of the wide shared mythemes (>50%) as the better root."

Unfortunately, this produced an unexpected root, as shown in the tree below. The colors in the tree refer to various geographical groupings of the tale versions.

So, I re-analyzed the data using the rooting methods that I previously applied to the Red Riding Hood analysis:
  • For the bayesian analysis, I used MrBayes (2 runs, 4 chains, 1,000,000 generations, sampling frequency 1000, 25% burnin) with a relaxed clock (with independent gamma rates model for the variation of the clock rate across lineages).
  • For the neighbor-joining tree I used the BioNJ algorithm in PAUP*, and found the midpoint root.
  • For the parsimony analysis, I used a 200-replicate parsimony-ratchet search via PAUP*, calculated the branch lengths of the majority-rule consensus tree with ACCTRAN optimization, and found the midpoint root.
These three alternative roots are also shown on the tree. They seem more likely than the published root.

Geographically, the root chosen by the author's method is within the red group (tales from Asia), based on the idea that "arguments in favour of localization of protypical Cosmic Hunt in Asia seem persuasive (Berezkin 2005)." Unfortunately, this a priori argument seems to have excluded any testing of the possibility that more than one version is the sister to the remaining tales — that is, only single outgroups were considered.

On the other hand, all three of the alternative roots group the tales into two major clades. For the bayesian-clock root the two clades have distinct animal motifs, a herbivore and a carnivore, respectively. These clades do not correspond to any of the three variants recognized by Berezkin (2005).

The bayesian-clock root puts the red-colored (Asia) versions of the tale into one of the two major clades, as it also does with the orange group (Africa), which makes this root more consistent with the geographical groupings — that is, all of the geographical groups are in only one of the two major clades, except for the purple group (American coast-plateau / British Columbia). Both the Parsimony and NJ roots do the same thing, but as well as the purple group they also split the pink group (northeastern America) between the two major clades, which reduces their geographical consistency compared to the bayesian-clock root.

The bayesian-clock root does not support the suggestion that the Cosmic Hunt myth originated in Asia. Indeed, the bayesian tree does not support any particular geographical location. Furthermore, the polyphyly of the purple group presents an intriguing aspect of the tale's history.


Yuri Berezkin (2005) The cosmic hunt: variants of a Siberian—North-American myth. Folklore 31: 79-100.

Julien d'Huy (2013a) Polyphemus (Aa. Th. 1137): a phylogenetic reconstruction of a prehistoric tale. Nouvelle Mythologie Comparée 1: 1-21.

Julien d'Huy (2013b) A cosmic hunt in the Berber sky: a phylogenetic reconstruction of a Palaeolithic mythology. Les Cahiers de l’AARS 16: 93-106.


  1. Thank you for your comment. Note that the rooting in Asia is made on the basis of observed and agreed ethnological facts, and not on the basis of a tree-clock assumption. You appear to think that argument of molecular clock evolution is valid here. However, mythological evolution seems to be driven by the law of punctuated equilibria. That causes great deviation from the molecular clock and disagrees with your conclusion.
    Thank you again for the opportunity to discuss and to make these comments.

    1. Dear Julien, I used a relaxed clock, which assumes only that the clock applies on average, unlike the strict clock which assumes a constant rate. Thus, variation in rate is accounted for by a relaxed clock. Also, having read Berezkin's paper, I am not convinced that he shows an origin in Asia; nor that the sister group must consist of only one tale, as your method assumes. So, finding out what root or roots are supported by the data themselves, is an obvious thing to try. /David

  2. Dear David,

    Thank you for your reply.

    First, I agree with the fact that the sister group must consist of only one time is still under debate: to my opinion, the story is too complex and probably have not been created two or more times; many versions would be conceived in Asia in a consecutive manner and would have subsequently migrated to Africa and America. That would make more sense for the ethnological data. Yet it remains a research question.

    Second, about the relaxed clock, I may have difficulty understanding: I thought that it was still unclear how the relaxed clock models perform in the presence of punctuated molecular rate shifts between lineages (eg Dornburg and al. 2011:; Ho 2009:
    Moreover, we can read in the MrBayes manual that "Unlike a non-clock model, [a Relaxed Clock Model] produces a rooted tree, but the information about the position of the root is not as strong as in a non-clock analysis. Because the information about the position of the root might be weak, it is often bene cial to add a rooting constraint to a relaxed clock analysis." ( So I am rather sceptical about the ability of a relaxed clock to root this tree.

    Finally, regarding this rooting point, could you provide a minor clarification? By "not convinced", I mean you think "not convinced by some of Berezkin's arguments"? Because Berezkin himself writes recently that "The Cosmic Hunt myth and the interpretation of Belt of Orion in its context probably also emerged somewhere in Central Eurasia and were brought from there to North America and to Africa." (

    Once again I thank you for your feedback and for your interest, and I look forward to your reply.

    1. Julien, First, what you write about the limitations of a relaxed-clock root is correct, as far as I know. All methods have limitations, including the "preferred" one of having an outgroup. That is why phylogeneticists often try different possible analyses, to see whether there are possible problems from the limitations. My use of the clock is simply an attempt to see what the data say (all three methods root the tree in approximately the same place). If the data root does not match any a priori root (such as for the Cosmic Hunt, and Little Red RIding Hood) then that indicates something worth investigating further. Second, I mean that Berezkin does not provide any convincing *evidence* for his choice within Asia for the origin, as opposed to somewhere geographically nearby. /David