The Genealogical World of Phylogenetic Networks: December 2016

Sunday, December 25, 2016

James Bond, alcoholic

Merry Christmas to everyone. As usual for this blog at this time of year, for your Christmas reading we will take a look at a particular aspect of human consumption, in this case alcohol.

James Bond was created in 1953 by Ian Fleming (who also created Chitty-Chitty-Bang-Bang, The Magical Car), and over a 14-year period there was a series of 12 novels and two short-story collections. The rights to the character were purchased for the film world in the 1960s, so that over the past 50 years we have had a franchise of 24 official films, plus two other licensed ones (Casino Royale in 1967, and Never Say Never Again in 1983).

Actually, the first licensed Bond film was a long-forgotten one made for CBS TV in 1954. This was a 1-hour version of Casino Royale, starring Barry Nelson as Bond, Peter Lorre as Le Chiffre, and Linda Christian as a renamed Vesper Lynd (see Barry Nelson - den bortglömde Bond).

This movie infographic (excluding the 2015 film, and the unofficial films) is from The Economist.

The Bond character

James Bond has been portrayed in films officially by six different actors, but the character remains essentially the same, although somewhat different from the one depicted in the books.

In early 1997, the monthly magazine Men's Health published an article in which doctors and psychologists commented on the life and lifestyle of the Bond character, the world's most un-secret secret agent (see Sprit, kvinnor och cigarretter tog livet av James Bond). The results were not good — Bond was either dead or close to it, as he was a paranoid, impotent alcoholic.

Bond's psychological profile was that of an emotionally stunted psychopath of type A who suffers from post-traumatic stress. According to Fleming's books, Bond was orphaned at age 11 (his parents died in a mountaineering accident), he lost his virginity in a brothel in Paris at 16, and killed his first mistress the following year. An ideal man to be a licensed assassin.

His massive daily alcohol consumption (all carefully documented in both the books and films) makes him a category 3 alcoholic. This means that he couldn't possibly have done his actual job competently; and it should also have led to violent temper outbursts (which may explain the government-sanctioned killing sprees). The liquor should also have led to a shrinking of his genitals, and have damaged his liver to the extent that it could no longer break down estrogen, so that he started to develop breasts and become impotent. His well-documented sexual excesses would also make him a prime candidate for sexually transmitted diseases. On top of this, the books (but not the films) also document a comprehensive smoking habit.

Bond was, of course, a form of wish-fulfillment for his creator, Ian Fleming, who was also a heavy drinker and smoker. He died of a heart attack at age 56, an age that Bond himself could not possibly have out-lived. Bond was more in danger from his own lifestyle than from SMERSH, or anyone else bent on world domination.

Bond is thus more a collection of memes than an actual character. This infographic is from the GBShowPlates website, and summarizes Bond's lifestyle.

The Bond drinks

Just about every aspect of Bond's career has been analyzed, and ranked, from the music to the cars to the watches, and most especially the women (the so-called "Bond girls"). However, much of the interest seems to lie in the booze, which is what we will look at here.

Along with coffee (and, once, tea), Bond has consumed copious amounts of alcohol, which he tends to drink alone, or in private settings. He is also what is known as a "label drinker", in that the brand is at least as important as the bottle's contents. This is a gift for the liquor industry, who, along with the car industry, are perpetually looking for opportunities for "brand placement" in films and sporting events. Fleming was chastised for introducing this into his books, but he simply replied that it was an attempt to round-out the character.

As far as the novels are concerned, they have received special medical attention by Graham Johnson, Indra Neil Guha, Patrick Davies (2013. Were James Bond’s drinks shaken because of alcohol induced tremor? British Medical Journal 347: f7255). They recorded every drink consumed in every book, calculated the number of alcohol units involved, and then converted that to daily intake (since the books are quite clear about their time span).

Their results are summarized in this infographic, from their article.

Basically, the medical results were as before:

Across 12 of the 14 books, 123.5 days were described, though Bond was unable to consume alcohol for 36 days because of external pressures (admission to hospital, incarceration, rehabilitation). During this time he was documented as consuming 1150.15 units of alcohol. Taking into account days when he was unable to drink, his average alcohol consumption was 92 units a week (1150 units over 87.5 days). Inclusion of the days incarcerated brings his consumption down to 65.2 units a week. His maximum daily consumption was 49.8 units (From Russia with Love day 3). He had 12.5 alcohol free days out of the 87.5 days on which he was able to drink.

Furthermore, when we plotted Bond's alcohol consumption over time, his intake dropped in the middle of his career but gradually increased towards the end. This consistent but variable lifetime drinking pattern has been reported in patients with alcoholic liver disease.

UK NHS [National Health Service] recommendations for alcohol consumption state that an adult male should drink no more than 21 units a week, with no more than 4 units on any one day, and at least two alcohol free days a week. James Bond's drinking habits are well in excess of each of these three parameters. This level of consumption makes him a category 3 drinker (>60 g alcohol / day) and therefore in the highest risk group for malignancies, depression, hypertension, and cirrhosis. He is also at high risk of suffering from sexual dysfunction, which would considerably affect his womanising.

Analyzing the films is more difficult. A number of people have tackled this task, including Nerdist, The Grocer, and Atomic Martinis (now defunct, but repeated on the website of the world's only James Bond Museum, in Sweden), and David Leigh. The basic problem seems to be whether the alcohol is "spotted either in hand, glass or in the background". Also, "The major problem is 007’s frequent enjoyment of multiple bottles of champagne, or portions of bottles of liquor ... it is often impossible to determine exactly how many separate drinks came from a given bottle."

The following infographic (not including the 2015 movie or the unofficial films) is derived from one produced at Buddy Loans. However, some of the people at Reddit were not happy with the original, so it was redesigned, as shown here.

The people at Nerdist took the data from this film infographic, converted it from units of alcohol to grams of alcohol, and then used this to estimate Bond’s total alcohol content. This yields a Blood Alcohol Content of 3.7%. "While some humans have survived a BAC of past 1%, it generally holds that anything past 0.5% will either kill you or leave you seriously poisoned. Therefore ... Bond’s tipsy tally is enough to put a man past a safe limit seven times over."

At The Grocer, they have also pointed out the relative booziness of the various Bond incarnations, by calculating the average intake per film by each actor, in units of alcohol:

Sean Connery
George Lazenby
Roger Moore
Timothy Dalton
Pierce Brosnan
Daniel Craig

11
9
11
4.5
12
20

Finally, we need a phylogenetic network, of course. I collated the presence/absence of each drink type for each book and movie (excluding the 2015 film) from the book by David Leigh (2012. The Complete Guide to the Drinks of James Bond, 2nd edition. Kindle), and then updated this where it clearly disagrees with other sources. (For example, no mention is made of sherry, and yet it is involved in one of the most popular Bond scenes from the film version of Diamonds are Forever.) I then analyzed the data using a NeighborNet. (James Bond Memes has tried an ordination analysis of the same data source.)

The books are shown in red, and the early films starring Connery and Lazenby are shown in blue (including Connery's later Never Say Never Again). These books and films are almost all at the top and right of the network, indicating that they have a distinct collection of drink types compared to the later films. I suspect that this reflects increasing use of "product placements" in the films. The only book plus movie combination that has similar drinks is You Only Live Twice. Interestingly, the Skyfall movie (from 2012) seems to return to the drinks genre of the earlier works, even though the alcohol consumption is much higher. The most unusual works were the Goldfinger and On Her Majesty's Secret Service books, where a number of drink styles were consumed that appeared nowhere else in the canon.

As noted by Johnson et al. (quoted above):

Despite his alcohol consumption, [Bond] is still described as being able to carry out highly complicated tasks and function at an extraordinarily high level. This is likely to be pure fiction.

Tuesday, December 20, 2016

Isogloss maps are hypergraphs are bipartite networks

Linguists are a very special people. They are very proud, especially when biologists tell them how to do phylogenetic analyses; but their pride is often also justified, as many phylogenetic concepts were initially or independently developed by linguists, be it the family tree model, proposed years before Darwin's (1859) tree by Ćelakovský (1853), or even the cladistic principle of synapomorphies, which are called "exclusively shared innovations" in linguistics (see Brugmann 1884).

Linguists also invented one interesting kind of data-display which so far has never been used by biologists (at least as far as I know): maps of isogloss boundaries. The term "isogloss" is an unfortunate term, as it has multiple usages in linguistics, and its history seems to go back to a naive borrowing from chemistry (but I have not really followed the literature here). On most occasions, it just means "shared trait". That is, it denotes a features shared between two or more languages; and given that languages may share many different features, isoglosses for a group of related languages may yield a very complex type of data. Isoglosses are somehow related to the wave theory, the arch-enemy of the family tree in linguistics, which I described as a mystical theory some time ago, since it never really made it to a clear-cut model that could be formalized (The Wave Theory: the predecessor of network thinking in historical linguistics ).

Some linguists, nevertheless, insist that the waves that are the core of the wave theory are nothing other than isoglosses. More specifically, the waves represent innovations that contribute to the separation of languages (a change in pronunciation of a word here, a change in grammar there), but which are not transmitted vertically — they spread across the speakers of a language and may even cross linguistic borders. One early visualization of these waves can be found in Bloomfield (1933), as shown here:

What Bloomfield essentially does here is pick certain traits of Indo-European languages, calling them isoglosses, and arrange them on a quasi-geographic map of Indo-European languages in such a way that all languages sharing a trait are inside one of these isogloss boundaries.

Only recently, I realised, what this actually means, when I found the "Bible of Network Theory" by Newman (2010) and started reading at a random page, which — as it turned out — treated hypergraphs. Hypergraphs, as I learned from Newman, are graphs in which one edge can connect to more than one node, and Newman used exactly the same visualization for these hyperedges as Bloomfield had done in 1933, without knowing that it was actually a rather complex network structure he was proposing.

Even more interesting than the complex graph structure is that hypergraphs can be likewise displayed as bipartite networks, in which we distinguish two fundamental kinds of nodes, and in which connections are only allowed between nodes of different kinds, without losing any information. In order to do so, one just converts all hyperedges into a node that connects to all nodes (languages in our case) to which the edges connect in the hypergraph. In the same way that Bloomfield labeled the hyperedges in his legend, we can label the isogloss nodes that connect to the languages. The following image shows the resulting bipartite network for Bloomfield's hypergraph:

If you now ask what this tells us after all, I will disappoint you — so far it does not tell us anything, it is just a display of data in a different fashion. Note, however, that hypergraph visualization is not a trivial problem, and if you have enclaves not sharing a trait, it may even be impossible to visualize hypergraphs in a two-dimensional space by just using one line that connects to all nodes. Bipartite networks are easier to handle in this regard. Even more importantly, however, bipartite graphs are also easy to handle algorithmically, and biologists are currently developing new methods to handle them (Corel et al. 2016).

If we visualize the Bloomfield data in a bipartite network using network visualization software such as Cytoscape, we can conveniently explore the data, and arrange the nodes in order to search for patterns in the isoglosses. The following visualization, for example, shows that Bloomfield chose the data well in order to illustrate the amount of conflicting, apparently non-tree-like, signal in Indo-European languages (remember that linguists tend to dislike trees, but not necessarily in a productive way), as the data describes more of a circular structure than a strict hierarchy.

In order to really interpret this kind of data, however, we should not forget that this is still a data-display network. It is by no means a phylogenetic analysis, as we only show how a certain amount of data selected by a scholar and distributed over the given language groups. A true phylogenetic analysis will need to interpret these data, making bold claims about the history of those shared traits.

The existence of sibilants (s-like sounds, like [s, z, ʃˌ ʒ]) for certain velar sounds (k-like sounds, like [k, g, x]), for example, is a trait shared by Balto-Slavic, Indo-Iranian, Armenian, and Albanian, but this does not mean that they all inherited it from a common ancestor, as the process of palatalization, by which velar sounds turn into affricates and fricatives (compare French cent, which was pronounced kentum in Latin), is very frequent in the languages of the world, and may well reflect independent evolution.

Apart from independent development, which would actually force us to revise our network, deleting the respective edges because they are not homologous in the strict sense means that we may also have to deal with differential loss. This quite likely happened with the shared feature labeled as "past e-" in the network, referring to the past tense in Ancient Greek and Indo-Iranian, which was augmented by the prefix e-.

A further reason for those commonalities labelled as isoglosses by linguists may also be simple lateral transfer due to language contact.

Proponents of the wave theory have taken this kind of data as proof that the family tree model is essentially wrong. While I would agree that the family tree model shows only a certain aspect of language evolution, and may therefore be boring at times (and even wrong, if we do not manage to correctly interpret the nature of shared traits), I have a hard time understanding why linguists still insist that isogloss maps are an alternative model of language evolution. They are surely not, in the same way in which splits graphs are not phylogenetic networks, as David emphasized in a recent blogpost.

Unless we add the missing time dimension and analyse how the shared traits originated, isogloss maps and hypergraphs will remain nothing more than an interesting form of data visualization. Given the recent research on bipartite networks, however, we may have some hope that the mysterious waves in historical linguistics may not only find a formal model of representation, but even bring us to the point where we gain new insights into the history of our languages.

References

Bloomfield, L. (1973) Language. Allen & Unwin: London.
Brugmann, K. (1884) Zur Frage nach den Verwandtschaftsverhältnissen der indogermanischen Sprachen [Questions regarding the closer relationship of the Indo-European languages]. Internationale Zeischrift für allgemeine Sprachewissenschaft 1. 228-256.
Čelakovský, F. (1853) Čtení o srovnavací mluvnici slovanské [Lectures on comparative grammar of Slavic]. V komisí u F. Řivnáče: Prague.
Corel, E., P. Lopez, R. Méheust, and E. Bapteste (2016) Network-thinking: graphs to analyze microbial complexity and evolution. Trends Microbiol. 24.3: 224-237.
Darwin, C. (1859) On the origin of species by means of natural selection, or, the preservation of favoured races in the struggle for life. John Murray: London.
Newman, M. (2010) Networks. An Introduction. Oxford University Press: Oxford.

Tuesday, December 13, 2016

Motivations for producing the earliest pedigrees

The stemmata in ancient Roman houses (depicting portraits of ancestors) were used to assert the nobility of the nobles by right of family descent — stemmata distinguished between the patrician class (those with noble ancestry) and plebeians (commoners). It is therefore unsurprising that the Medieval nobility subsequently started to produce diagrams, as their way of illustrating their own succession in unambiguous terms (although it was not until much later that genealogies became common).

For example, as discussed in my post on The first royal pedigree, the earliest known illustration of a family tree is from c.1000 CE (see Schmid 1994), in which Cunigunde of Luxembourg's ancestry is traced in a tree-like manner to include the emperor Charlemagne (Charles the Great), thus legitimizing her claim to being of royal descent — she married Henry, Duke of Bavaria, in 999 CE, and he became King Henry II of Germany in 1002, at which point she became Queen consort of Germany (1002-1024).

However, pedigrees were also produced for the opposite purpose — to try to prevent marriages, for example on the basis that they violated church law. The earliest known such case involved the marriage, in 1043 CE, of King Henry III of Germany (1016-1056, later Emperor Heinrich of the Holy Roman Empire) to Agnès of Poitou (1025-1077).

Heinrich was briefly (1036-1038) married to Gunhilda of Denmark. After her death, for political reasons he wanted to remarry with someone from France. He chose the young daughter of Duke William V of Aquitaine. She thus became Queen consort of Germany (1043-1056) and then Empress consort of the Holy Roman Empire (1046-1056); from 1056-1061 she acted as regent of the Holy Roman Empire during the minority of her son Henry IV.

The official basis for objecting to this marriage was that the bride's and groom's maternal great-grandmothers were half-sisters, so that Henry and Agnes were third cousins. Moreover, on Henry's father's side they were also fourth cousins once removed. This is illustrated in the following genealogy from Michel Parisse (2004).

Note that Henry III appears twice, once as the son of his father and once as the son of his mother, thus simplifying the network to a tree; this is a point that I have commented on before.

The person formally objecting to this marriage was Siegried of Gorze, who researched the family history and drew the first version of the pedigree. As discussed by Bouchard (2001):

Abbot Siegried of the reformed monastery at Gorze wrote very shortly before [the marriage] to his friend Abbot Poppo of Stablo [or Stavelot], who possessed the confidence and respect of Henry, urging him even at the eleventh hour, and at risk of a possible loss of the king's favor, to do all that he possibly could to prevent it. Neither Poppo, nor Bishop Bruno of Toul (later Pope Leo IX), to whom Siegfried addresses still more severe reproaches, nor Henry himself, paid much heed to these representations.

Henry apparently rebutted Siegried's claim by (falsely) claiming that the pedigree was at fault (ie. the great-grandmothers were not half-sisters). Nevertheless, various published versions of Siegfried's pedigree continued to appear over the subsequent 500 years (see Gädeke 1992). You can read Siegfried's original Latin letters (without the accompanying family tree) in the paper by Michel Parisse (2004). Jean-Baptiste Piggin has a transcription of the genealogy, taken from an early 11th century book (see the blog post: Two medieval drawings).

Part of the issue here is the change in the church laws relating to consanguinity (the degrees of relationship within which marriage was uncanonical), which had occurred during the first half of the ninth century. At that time, both the number of forbidden degrees was increased, from four to seven, and the method of calculating those degrees was changed. These two changes are illustrated here (from Bumke 1991).

So, the church councils held at Rome (during the first half of the eighth century) forbade marriage only between: siblings; parents and offspring; grandparents with grandchildren; a man and his niece (but not a woman and her nephew!); and first cousins. However, the canonical changes during the subsequent century forbade everything out to sixth cousin. The reasoning behind these extreme changes is not fully understood.

Needless to say, these new laws of consanguinity created an impossible situation when, as Bumke (1991) puts it:

in the course of the tenth and the first half of the eleventh century a small number of royal and princely families, already connected by marriage ties in the past, emerged and ruled most of western and central Europe.

Under the new rules, it would not take long for a restricted group of people to become too closely related to inter-marry at all — royalty could not marry royalty. So, Henry set a precedent for his kin when he managed to bypass the new rules, which the aristocracy were likely to ignore anyway. These rules remained in force until 1215 (the Fourth Lateran Council), when the degrees were reduced again to four, but still counted in the "new" way.

As a final note, this sort of religious interference was not always unsuccessful. For example, in the early 1100s Henry I of England suggested marrying one of his (illegitimate) daughters to William de Warenne (2nd Earl of Surrey), but was dissuaded by Archbishop Anselm of Canterbury, who pointed out the prohibited degrees involved. Shortly afterwards, Bishop Ivo of Chartres successfully intervened in the proposed marriage of another of Henry's (illegitimate) daughters to Hugh fitz Gervaise of Châteauneuf-en-Thymerais.

References

Constance Brittain Bouchard (2001) Those of My Blood: Constructing Noble Families in Medieval Francia. University of Pennsylvania Press, Philadelphia.

Joachim Bumke (1991) Courtly Culture: Literature and Society in the High Middle Ages. University of California Press, Berkeley.

Nora Gädeke (1992) Zeugnisse bildlicher Darstellung der Nachkommenschaft Heinrichs I. Arbeiten zur Fruhmittelalterforschung 22. De Gruyter, Berlin.

Michel Parisse (2004) Sigefroid, abbé de Gorze, et le mariage du roi Henri III avec Agnès de Poitou (1043). Un aspect de la réforme lotharingienne. Revue du Nord 356-357: 543-566.

Karl Schmid (1994) Ein verlorenes Stemma Regum Franciae. Zugleich ein Beitrag zur Entstehung und Funktion karolingischer (Bild-)Genealogien in salisch-staufischer Zeit. Frühmittelalterliche Studien 28: 196-225.

Tuesday, December 6, 2016

Why are splits graphs still called phylogenetic networks?

This is an issue that has long concerned me, and which I think causes a lot of confusion among biologists. A phylogenetic tree is usually a clear concept — to a biologist, it is a diagram that displays a hypothesis of evolutionary history. The expectation, then, is that a phylogenetic network does the same thing for reticulate evolutionary histories. However, this is not true of splits graphs; and so there is potential confusion.

Mathematically, of course, a phylogenetic tree is a directed acyclic line graph. It is usually constructed, in practice, by first producing an undirected graph based on some pattern-analysis procedure, and then nominating one of the nodes or edges as the root (say, by specifying an outgroup). So, the mathematics is not really connected to the biological interpretation. To a mathematician, the tree is a set of nodes connected by directed edges, and the nodes could represent anything at all, as could the edges. It is the biologist who artificially imposes the idea that the nodes represent real historical organisms connected by the flow of evolution — ancestors connected to descendants by evolutionary events.

A phylogenetic network should logically be a generalization of this idea of a phylogenetic tree, adding the possibility of evolutionary relationships due to gene flow, in addition to the ancestor-descendant relationships. This can be done, but it is only partly done by splits graphs.

That is, a splits graph generalizes the idea of an undirected line graph (an unrooted tree), but not a directed acyclic graph (a rooted tree). It follows the same logic of using a pattern-analysis procedure to produce an undirected graph, although the graph can have reticulations, and thus is a network rather than necessarily being a bifurcating tree. However, it is not straightforward to specify a root in a way that will turn this into an acyclic graph. So, in general it does not represent a phylogeny.

Indeed, splits graphs are simply one form of multivariate pattern analysis, along with clustering and ordination techniques, which are familiar as data-display methods in phenetics (see Morrison D.A. 2014. Phylogenetic networks — a new form of multivariate data summary for data mining and exploratory data analysis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4: 296-312). In this sense, it makes no difference whatsoever what the data represent — they can be data used for phylogenetics, or they could be any other form of multivariate data. Indeed, this point is illustrated in many of the posts in this blog, which can be accessed in the Analyses page.

So, unlike unrooted trees, unrooted splits graphs are not a route to producing a phylogenetic diagram. Mind you, they are a very useful form of multivariate data analysis in their own right, and I value them highly as a form of exploratory data analysis. But that doesn't make them phylogenetic networks in the biological sense.

So, isn't it about time we stopped calling splits graphs "phylogenetic networks"? They aren't, to a biologist, so why call them that?