Monday, February 4, 2013

Network analysis of Genesis 1:3

This idea was stolen blatantly from the Laboratory Exercises in Evolution at the Biology Department, University of Virginia (Janis Antonovics, Joanna Vondrasek, Doug Taylor), where it is set as a class exercise for learning phylogenetic analysis. In turn, these people credit a similar idea to Barbrook et al. (1998. The phylogeny of the Canterbury Tales. Nature 394: 839), although the originators of the idea appear to be Robinson and O'Hara (1996. Cladistic analysis of an Old Norse manuscript tradition. Research in Humanities Computing 4: 115-137). It is an exercise in stemmatology, which can be a lot more tricky than you might think.

Stemmatology is the discipline that attempts to reconstruct the transmission history of a written text on the basis of relationships between the various extant versions (eg. manuscripts or printings). These relationships can be revealed using phylogenetic networks, which is the approach that I present here. A network is more appropriate than a phylogenetic tree, for reasons that will become obvious — the evolution of books is not a simple thing.


The original text of the christian Bible was written mostly in Hebrew and Aramaic for the Old Testament, and in Greek for the New Testament. It was later translated into Latin, which was then standardized as the "Vulgate", and this was then almost the only version used in churches for the best part of a millennium. The only texts in Old English consisted usually of either the Gospels or the Psalms only.

This situation was challenged in the late 14th century, when the first Middle English translations of the whole Bible appeared. There was active resistance to this by the formal Church, and so the idea of an English translation was dropped until the mid 16th century, when the Reformation inspired attempts to translate the books into Modern English as part of a new Protestant religion. These moves were sanctioned by the government, with first the Great Bible (1539) and then the King James Version (1611). Various revisions of the latter have appeared, especially since the late 19th century. These days, there is a veritable cottage industry producing new versions of the Bible for various purposes, usually based on the original texts rather than on earlier translations, with various translation principles being employed (eg. Formal Equivalence, Dynamic Equivalence, Closest Natural Equivalence, etc).

You can consult the various versions of the English-language Bible at one or more of several online sites:
The data used below were all obtained from these sites. These sites suggest that the most famous English-language versions of the Bible are: the Geneva Bible (1560), as used throughout the Reformation, and by William Shakespeare as well as by the "Pilgrim Fathers" in America; and the King James Version (1611), which was the standard English text for a quarter of a millennium. The most widespread current Bible is apparently the New International Version, which has been updated several times since its first appearance in 1973.


The text that I use is the third sentence of the Bible — Genesis 1:3. (The biblical text was first numbered in the Geneva Bible of 1560.) Here is a dated listing of that sentence in all of the early English translations, plus most of the revisions up to the mid-20th century, and a sample of the many recent versions:

1382 Wycliffe Bible  And God seide, Be maad li3t; and maad is li3t.
1395 Later Wycliffe  And God seide, li3t be maad; and li3t was maad.
1530 Tyndale Bible  Then God sayd: let there be lyghte and there was lyghte.
1535 Coverdale Bible  Than God sayd: let there be light: & there was lyght.
1537 Matthew Bible  And God sayde: let there be light, and there was light.
1539 Great Bible  And God sayde: let there be made lyght, and there was light made.
1560 Geneva Bible  Then God saide, Let there be light: And there was light.
1568 Bishop's Bible  And God sayde, let there be light: and there was light.
1609 Douay-Rheims Bible  And God said: Be light made And light was made.
1611 King James Version  And God said, Let there be light: and there was light.
1750 Challoner Revision  And God said: Be light made. And light was made.
1769 Blayney Revision  And God said, Let there be light: and there was light.
1833 Webster's Bible  And God said, Let there be light: and there was light.
1862 Young's Literal Translation  and God saith, 'Let light be;' and light is.
1885 English Revised Version  And God said, Let there be light: and there was light.
1890 Darby Bible  And God said, Let there be light. And there was light.
1901 American Standard Version  And God said, Let there be light: and there was light.
1950 Knox Bible  Then God said, Let there be light; and the light began.
1952 Revised Standard Version  And God said, "Let there be light"; and there was light.
1971 New American Standard Bible  Then God said, "Let there be light"; and there was light.
1973 New International Version  And God said, "Let there be light," and there was light.
1976 Good News Bible  Then God commanded, "Let there be light" — and light appeared.
1982 New King James Version  Then God said, "Let there be light"; and there was light.
1995 God's Word Translation  Then God said, "Let there be light!" So there was light.
1996 New Living Version  Then God said, "Let there be light," and there was light.
2011 Common English Bible  God said, "Let there be light." And so light appeared.

The first thing we need to do is align the text of these 26 versions, including both words and punctuation. This allows us to directly compare each of the elements of the sentence, comparing like with like as far as their features are concerned.

This is not as easy as it sounds. In this alignment I have separated words when they seem to have a different intent — for example, "was made" is not equivalent to "appeared". I can see endless arguments about the alignment of any text; and, indeed, disagreements about the intent of the original text is what has lead to so many different versions of the Bible being created in English.

This alignment then needs to be coded as a set of characters, which define the hypothesized homology between the various elements of the text. In this case I ended up with 50 additive binary characters for analysis. In general, I used Young's Literal Translation to determine the ancestral state for each character, as this translation was an explicit attempt to emulate the Hebrew original. A nexus-formatted version of the dataset is available here.

Various network methods could be used to summarize the character data. First, I have used a NeighborNet based on hamming distances, as I usually do (see my earlier analyses). As you can see from the graph, there is no simple tree-like relationships among these texts, which calls into question any simplistic attempt at stemmatology. (Note that in two cases there are multiple texts that have identical sentences, and thus they appear at the same location in the graph.)

It is worth pointing out here that Barbrook et al. (1998) produced a bush-like graph from their data for the Canterbury Tales, but only after deleting 14 of their 58 manuscripts, "as they were likely to have been copied from more than one exemplar, either by deliberate conflation of readings or by changing the exemplar during the course of copying." A similar explanation is likely to apply for some of the texts for Genesis 1:3, although many of them were translated directly from the original Hebrew rather than from later translations (eg. the Latin "Vulgate").

Nevertheless, there is a general separation of the older Genesis texts on the right of the graph and the more recent texts on the left. This might be easier to assess if we simplify the graph.

As a simpler summary of the same relationships, I have used a Reduced Median Network, based on r = 2 (the program default). Note that the time order is reversed in this graph, with the older texts on the left and the more recent texts on the right. The only major discrepancy between the two graphs is the relative placement of the Bishop's Bible. (Also, I have not labelled the two cases where there are several texts that have identical sentences.)

Historically, we would expect the Tyndale Bible, Coverdale Bible, Matthew Bible and Great Bible texts to be closely related, but the Great Bible seems not to fit this expectation. Similarly, we would expect a similarity between the Geneva Bible and the Bishop's Bible, which is also not reflected in the study sentence; nor is the acknowledged debt of the King James Version to the Tyndale Bible.

However, the fact that the Wycliffe Bible and Later Wycliffe are written in Middle English rather than Modern English is clear from their distant relationship to the other texts; and the close historical relationship of the Challoner Revision and the Douay-Rheims Bible is also clear.

Several texts show isolated relationships. The Knox Bible, for example, is unique among the modern texts in being taken from the Latin rather than the original Hebrew, while the Common English Bible is unusual in trying to balance two translation principles (Dynamic Equivalence and Formal Equivalence) rather than using only one.

On the other hand, the New International Version is clearly a very traditional version of the text, given its relationships as shown in the two graphs, which perhaps explains its popularity.

The close association of the Good News Bible with Young's Literal Translation is interesting, given that the former is an (often criticized) free paraphrase of the original Hebrew text while the latter is a literal translation of that same text — you can't get more different translation principles.


The lack of any simple tree-like relationship among these biblical texts makes any attempt to study their phylogeny difficult. My own look at the business of stemmatology suggests that the results here are quite typical of any study of written texts. Part of the problem seems to be that ideas developed in one historical lineage can be transferred to other lineages, and even transferred to earlier parts of those lineages (see my previous post: Time inconsistency in evolutionary networks). So, even though there is a general historical trend through time, that trend is not consistent enough for a tree-based historical analysis to be effective.

Note: there is a later blog post on this same topic — Trees and networks of written manuscripts.


  1. The important point, IMHO, is to show that the Bible has a history of variation and selection itself - tree or network pattern is a trivial issue compared with literalistic interpretations.

    1. Indeed, such a study would be very interesting. It would be several lifetime's worth of work, though!

  2. This is very interesting. How important (if at all) to the final analysis is your dating of the versions? I have not gone through in detail, but there is a mistake in the dating of the NIV and the GNB.

    The Good News Bible, including the Old Testament, was indeed published in 1976. However, it began with Good News for Modern Man, published in 1966, which included only the New Testament. The New International Version published in 1973 included only the New Testament. The NIV edition with the Old Testament was not published until 1978.

    1. I appeciate for your interest, Bill. The dates are not used in the analysis at all. They merely put labels on the output, to aid interpretation. The dates I used are based on the three web sites that I have listed. Thanks for updating me on the publication years. /David