A couple of weeks ago we received an unexpected influx of visitors to this blog, being directed here by at article at the NBC News site. This article cited one of our blog posts (Network analysis of Genesis 1:3) as an example of the use of phylogenetic analysis in stemmatology (the discipline that attempts to reconstruct the transmission history of a written text). The NBC article itself is about a recently published paper that applies these same techniques to an oral tradition instead — the tale of Little Red Riding Hood. This paper has generated much interest on the internet, being reported in many blog posts, on many news sites, and in many twitter tweets. After all, the young lady in red has been known for centuries throughout the Old World.
Needless to say, I had a look at this paper (Jamshid J. Tehrani. 2013. The phylogeny of Little Red Riding Hood. PLoS One 8: e78871). The author collated data on various characteristics of 58 versions of several folk tales, such as plot elements and physical features of the participants. These tales included Little Red Riding Hood (known as Aarne-Uther-Thompson tale ATU 333), which has long been recorded in European oral traditions, along with variants from other regions, including Africa and East Asia (where it is known as The Tiger Grandmother), as well as another widespread international folk tale The Wolf and the Kids (ATU 123), which has been popular throughout Europe and the Middle East. As the author notes: "since folk tales are mainly transmitted via oral rather than written means, reconstructing their history and development across cultures has proven to be a complex challenge."
He produced phylogenetic trees from both parsimony and bayesian analyses, along with a neighbor-net network. He concluded: "The results demonstrate that ... it is possible to identify ATU 333 and ATU 123 as distinct international types. They further suggest that most of the African tales can be classified as variants of ATU 123, while the East Asian tales probably evolved by blending together elements of both ATU 333 and ATU 123." His network is reproduced here.
There is one major problem with this analysis: all three graphs are unrooted, and you can't determine a history from an unrooted graph. A phylogeny needs a root, in order to determine the time direction of history. Without time, you can't distinguish an ancestor from a descendant — the one becomes the other if the time direction is reversed. Unfortunately, the author makes no reference to a root, at all.
So, his recognition of three main "clusters" in his graphs is unproblematic (ATU 333; East Asian; and ATU 123 + African) although the relationship of these clusters to the "India" sample is not clear (as shown in the network). On the other hand, his conclusions about the relationships among these three groups is not actually justified in the paper itself.
Rooting the trees
So, the thing to do is put a root on each of the graphs. We cannot do this for the network, but we can root the two trees, and we can take the nearest tree to the network and root that, instead.
There are several recognized ways to root a tree in phylogenetics (Huelsenbeck et al. 2002; Boykin et al. 2010):
- a character transformation series (i.e. non-reversible substitution models)
- an outgroup
- mid-point rooting
- assume clock-like character replacement (e.g. the molecular clock).
I therefore did the following:
- For the parsimony analysis, I imported the author's consensus tree into PAUP* (the program he used to produce it), calculated the branch lengths with ACCTRAN optimization, and found the midpoint root.
- For the bayesian analysis, I re-ran the MrBayes analysis exactly as described by the author, except that I added a relaxed clock (with independent gamma rates model for the variation of the clock rate across lineages).
- For the phylogenetic network, the neighbor-net is basically the network equivalent of a neighbor-joining tree, and so I calculated this in SplitsTree (the program the author used), and found the midpoint root.
- Also, the strict clock version of a neighbor-joining tree is a UPGMA tree, which I calculated using SplitsTree.
Having the East Asian samples as the sister to the other tales does not match what would be expected for the historical scenario suggested by the original author from his unrooted graphs — that the East Asian tales "evolved by blending together elements of both ATU 333 and ATU 123".
Instead, this placement exactly matches an alternative theory that the author explicitly rejects: "One intriguing possibility raised in the literature on this topic ... is that the East Asian tales represent a sister lineage that diverged from ATU 333 and ATU 123 before they evolved into two distinct groups. Thus, ... the East Asian tradition represents a crucial 'missing link' between ATU 333 and ATU 123 that has retained features from their original archetype ... Although it is tempting to interpret the results of the analyses in this light, there are several problems with this theory."
The UPGMA root, on the other hand, would be consistent with the blending theory for the origin of the East Asian tales. However, this tree actually presents the African tales as distinct from ATU 123, rather than being a subset of it.
Anyway, the bottom line is that you shouldn't present scenarios without a time direction. History goes from the past towards the present, and you therefore need to know which part of your graph is the oldest part. A family tree isn't a tree unless it has a root.
Boykin LM, Kubatko LS, Lowrey TK (2010) Comparison of methods for rooting phylogenetic trees: a case study using Orcuttieae (Poaceae: Chloridoideae). Molecular Phylogenetics & Evolution 54: 687-700.
Huelsenbeck J, Bollback J, Levine A (2002) Inferring the root of a phylogenetic tree. Systematic Biology 51: 32-43.