Showing posts with label Hybridization network. Show all posts
Showing posts with label Hybridization network. Show all posts

Wednesday, January 21, 2015

Darwin, hybridization and networks


Charles Darwin's metaphor of the Tree of Life was not a tree, even in The Origin of Species. As noted by Franz Hilgendorf (see The dilemma of evolutionary networks and Darwinian trees) "the branches of a tree do not fuse again", and yet in his book Darwin discusses at least one circumstance when they do precisely that — hybridization.

Darwin's discussion of hybridization occupies all of chapter 8 of the Origin. His stated motivation is to address what many people might see as a fatal objection to his theory of species origins by means of natural selection. One of Darwin's main arguments in the book is that "descent with modification" is continuous, and therefore the distinction between species and varieties (and subspecies, etc) is an arbitrary cut in a continuum of biodiversity. However, it was conventionally accepted that varieties within the same species could cross-breed freely, but any attempt to hybridize distinct species would always fail. Darwin opposes this view by citing extensive evidence showing that varying degrees of sterility are encountered in efforts to cross-breed different species of plants (and a few birds) — if the species are closely related then often there will be a small degree of fertility in the hybrid offspring. So, as two related forms diverge from one another in the course of evolution, their ability to inter-breed gradually diminishes and eventually falls to zero (absolute sterility).

It is important to note that his motivation for writing about hybridization was independent of his ideas about phylogeny. So, he seems not to have noticed the consequence of hybridization for phylogenetic patterns.


This is similar to the situation regarding his so-called "tree diagram", in chapter 4. His motivation for the diagram (the only figure in his book) was a discussion of descent with modification, and particularly the continuity of evolutionary processes. He was expressing his idea about uninterrupted historical connections. In particular, this was part of his concern that there is no fundamental distinction between varieties and species, because evolutionary divergence is continuous — it is all a matter of degree, without sharp boundaries. His Tree of Life image expressed the continuity of evolutionary connections, not phylogenetic patterns. This is clear from his poetic invocation of the biblical Tree of Life, which is about the inter-connectedness of all living things along tree branches, not about patterns of biodiversity.

Implicit in this world view is the idea that the Tree of Life is still a tree in spite of hybridization. That is, Darwin failed to see that his "tree simile" (chapter 4) had to ignore hybridization (chapter 8) in order to work. His figure does not show any evidence of hybridization, only divergence. It was not intended to be what we would now call a phylogeny, but merely an idealized view of divergence and continuity of descent. When introducing the Tree of Life, he was using religious imagery to stimulate the imagination of his readers, and in so doing presented a contradictory argument — there is continuity along the branches as well as continuity of inter-connections.

The alternative conception is that Darwin's Tree of Life was never a tree — it was a network. From this world view, Hilgendorf's dilemma was actually irrelevant. He commented:
An observation which, as far as I know, contradicts these previously discussed views, [would be], that formerly separate species approach each other and finally merge with each other. This would not fit the beautiful image that Darwin presented about the connection of species in a branch-rich tree; the branches of a tree do not fuse again.
Well, they do, even in a Darwinian tree.

Wednesday, January 7, 2015

Complex hybridizations in wheat


Sometimes there has been discussion about the structural complexity of phylogenetic networks. At one extreme, species phylogenies are seen as trees with occasional reticulations, and at the other end there is a whole cobweb of reticulations with no visible tree. In this context, comments are sometimes made about the likeliness of those outputs from network programs that show extensive gene flow. If a biologist does not believe that the history of "their" organisms involves extensive reticulation, then the algorithmic outputs might be dismissed as unrealistic.

Here I present one well-known example of extensive hybridization, in which the computer programs seem to agree on the same complex solution — the history of common bread wheat.


The data and analyses are from:
Marcussen T, Sandve SR, Heier L, Spannagl M, Pfeifer M, International Wheat Genome Sequencing Consortium, Jakobsen KS, Wulff BB, Steuernagel B, Mayer KF, Olsen OA (2014) Ancient hybridizations among the ancestral genomes of bread wheat. Science 345: 1250092.
The hybridization network shown above is a montage of two different phylogenies from the original paper. It shows four splits, one homoploid hybridization, and two polyploid hybridizations. The time is shown in the circles in units of millions of years (note that the scale is not linear).

The first split (6.5 million years ago) is between the genera Triticum (wheat) and Aegilops (goatgrasses), which are morphologically highly distinct, with Aegilops having rounded glumes rather than keeled glumes. There are currently c.20 recognized species in both Aegilops and Triticum, so only a small part of the diversity is shown in the network.

Domesticated Bread wheat (T. aestivum) is a hexaploid species, with the three diploid genomes being known as A, B and D. Their lineages are labeled and colored in the network diagram. The genome D lineage is the result of a homoploid hybridization (which has been taxonomically treated as part of Aegilops). Bread wheat is then the recent result of two successive allopolyploid hybridizations, with a tetraploid lineage as the intermediate.

Of the other species shown in the network, all of the goatgrasses are wild diploid species, as is T. uartu. T. monococcum is also diploid, with domesticated Einkorn wheat being derived from the wild ancestor. T. turgidum is a tetraploid species, with domesticated Emmer wheat being derived from the wild ancestor — it has recently diversified into many modern wheat species.

This is one of the most complex phylogenetic networks known, although that complexity is at least partly the result of leaving out most of the other diploid species in the Triticum and Aegilops clades. Program outputs that are more complex than this are unlikely to be realistic.

Wednesday, December 17, 2014

Current methods for evolutionary networks


It has been noted before that we have a wide range of mathematical techniques available for producing data-display networks, most notably the many variants of splits graphs (see Huson & Scornavacca 2011). For example, NeighborNets and Consensus networks are commonly encountered in the phylogenetics literature, and Reduced median networks and Median-joining networks are commonly used for haplotype networks in population biology.

However, there are few techniques used to produce evolutionary networks. Studies of reticulate evolutionary histories, which include recombination networks, hybridization networks, introgression networks and HGT networks, have no unifying theme as yet. So, the biological literature has many papers in which biologists struggle with reticulate evolutionary histories using ad hoc collections of techniques, which often boil down to simply presenting incongruent phylogenetic trees from different datasets (see Morrison 2014a).

So, maybe a brief look at the current state of play with evolutionary networks would be useful. There are enough worthwhile techniques out there for people to be using them more often than they are.

Assumptions

Almost all current phylogenetic methods assume that the basic building unit is a non-recombining sequence block, for which the evolutionary history is strictly tree-like. We tend to call these blocks "genes" and their history "gene trees", but this is just for semantic convenience. In practice, we first collect data for various loci, and we then simply make the assumption that there is recombination between the loci but not within them. This is basically the assumption of independence between loci. At the limit, each nucleotide along a chromosome has a tree-like history, but for aggregations of nucleotides it is all assumptions.

Furthermore, we assume that there are no data errors that will confound any reconstruction of the phylogenetic trees. Possible sources of error include: incorrect data (e.g. contamination), inappropriate sampling (taxa or characters), and model mis-specification. Any of these errors will lead to stochastic variation at best and to bias at worst.

Gene-tree incongruence

Reticulate evolutionary processes lead to gene trees that are not all congruent. However, there are two other processes that have been widely recognized as also producing gene-tree incongruence, but which do not involve reticulation in the strict sense: incomplete lineage sorting (deep coalescence; ancestral polymorphism), and gene duplication-loss.

Many studies have now shown that stochastic variation due to ILS can be very large (see Degnan & Rosenberg 2009), and that this varies in relation to both the population sizes of the taxa and the times between divergence events. The expectation of completely congruent gene trees is thus very naive, even when the evolutionary history of the taxa has been strictly tree-like. A number of methods have been developed to reconstruct species trees in the face of ILS (Nakhleh 2013).

DL involves gene duplication (which can be repeated to create gene families) followed by selective gene loss. The phylogenetic history of the genes is usually presented as an unfolded species tree, where each gene copy has its own part of the tree. A number of methods have been developed to reconstruct gene DL histories given a "known" species tree, which is called gene-tree reconciliation (Szöllősi et al 2015). However, our interest here is in the reverse process, in which reconstructed but incongruent gene trees are combined into a single species tree, given a model of duplication and selective loss, which is called species-tree inference (which is the same as cophylogeny reconstruction; Drinkwater & Charleston 2014).

Reticulations

Known biological processes such as recombination, reassortment, hybridization, introgression and horizontal gene transfer all create reticulate phylogenetic histories. However, it is a moot point as to whether these processes can be distinguished from each other solely in the context of an evolutionary network (Holder et al 2001; Morrison 2015). These evolutionary processes operate by distinct biological mechanisms, but the evolutionary patterns that they create can all be rather similar. The processes all result in gene flow among contemporaneous organisms (usually called horizontal flow or transfer), whereas other evolutionary processes involve gene flow from parent to offspring (usually called vertical inheritance), including ILS and DL. These gene flows create incongruent gene histories, which we may detect directly in the data or via reconstructed gene trees. The patterns of incongruence do not necessarily allow us to infer the causal process.

There are a number of differences in pattern, but the consistency of these is doubtful. Polyploid hybridization produces the most distinctive pattern, because there is duplication of the genome in the hybrid. However, subsequent aneuploidy will serve to obscure this pattern. Homoploid hybridization nominally involves 50% of the genome coming from difference sources, while introgression ultimately involves a smaller percentage. However, in practice, genome mixtures vary continuously from 0 to 50%. HGT also involves a small percentage of the genome, but in theory it also can vary from 0 to 50%. Reassortment produces mixtures of viral genes, which can occur in such a great number that reconstructing the history is severely problematic.

So, in the absence of independent experimental evidence, distinguishing one form of evolutionary network from another is almost a matter of definition. This has become increasingly obvious in the methodological literature, where semantic confusion abounds.

For example, a network produced directly from a set of characters has usually been called a "recombination network", while one produced from a set of trees has usually been called a "hybridization network", irrespective of what processes the gene trees represent. Furthermore, models that add reticulation events to DL trees have usually referred to the horizontal gene flow as "HGT", whereas models that add reticulation events to ILS trees have usually referred to the horizontal gene flow as "hybridization" (Morrison 2014a). Studies of horizontal gene flow during human evolution have usually referred to "admixture", which is a more process-neutral term.

In many, if not most, cases we might all be better off if network methods simply distinguish gene flow among contemporaries (horizontal) from gene inheritance between generations (vertical), rather than trying to infer a process — process inference can often best take place after network construction. This does not help anthropologists, of course, who are dealing with evolutionary networks where oblique gene flow is possible (so that they do not have Time inconsistency in evolutionary networks).

Methods

There seems to be a dichotomy of purposes to current method development, which are neatly summarized by the contrasting theoretical views of Mindell (2013) and Morrison (2014b). These views each recognize that evolutionary history involves both vertical and horizontal processes, but they reconstruct the resulting evolutionary patterns as a species tree and a species network, respectively. Obviously, this blog is dedicated to the latter point of view, but it is the former one (the so-called Tree of Life) that seems to currently dominate the literature.

Focussing on gene-tree inference, Szöllősi et al (2015) provide a comprehensive review of the various models that have been used to describe the dependence between gene trees and species trees. Essentially, gene trees are contained within the species tree, and they may differ from it in relative branch lengths and/or topology. The differences between genes and species are the result of population-level processes, often modeled using the coalescent. These authors recognize four current classes of probabilistic model that combine different evolutionary processes:
  • the DLCoal model, which combines coalescence and DL
  • the DTLSR model and the ODT model, both of which combine gene transfer and DL
  • models that combine hybridization and ILS
  • models of allopolyploidization.
When inferring species trees from gene trees (species-tree inference), we basically combine the scores for all of the gene trees, and then search for the species tree with the best overall score. This involves adding the scores in parsimony analyses, or multiplying the conditional probabilities in likelihood analyses (ie. maximum-likelihood or bayesian context). Many methods have been developed for inferring a species tree based on multi-locus data. These differ in whether the gene and species trees are estimated simultaneously or sequentially, and in how the gene trees are used to infer the species tree. Nakhleh (2013) and Szöllősi et al (2015) discuss both parsimony and likelihood methods for species-tree inference based on either ILS or DL models.

Extending these ideas to infer networks (rather than species trees) is a bit more tricky, and most of the work to date has involved combining hybridization and ILS. There has been no recent summary of the ideas. However, calculating the parsimony score of a network, given a set of gene-tree topologies, has been addressed by Yu et al (2011); and Yu et al (2013a) have extended these ideas to heuristically search the network space for the optimal network (the one that minimizes the number of extra reticulation lineages in a species tree). Furthermore, methods for computing the likelihood of a phylogenetic network, given a set of gene-tree topologies, have been devised by Yu et al (2012, 2013b); and Yu et al (2014) have extended these ideas to heuristically search for the maximum-likelihood network for limited cases of introgression or hybridization (since they differ only in degree).

There are also several methods that simply use gene-tree incongruence to infer reticulation events in a species network (Huson et al 2010). Basically, these methods combine gene trees into "hybridization networks" by minimizing the number of reticulations required for reconciliation, measured either by counting the reticulations or calculating the network level. The combinatorial optimization can be based on trees, triplets or clusters, using parsimony as the optimality criterion. These methods model homoploid hybridization by assuming that reticulation is the sole cause of all gene-tree incongruence. This means that they are likely to overestimate the amount of reticulation in a dataset when other processes are co-occurring.

The most completely developed network methods involve data for allopolyploid hybrids. Here, there are multiple copies of each gene, one in each copy of the genome, so that allopolyploid hybrids have more copies than do their diploid parent taxa. To construct a hybridization network topology, Huber et al (2006) developed a parsimony method based on first estimating a multi-labeled gene tree, and then searching for the single-labeled network that best accommodates the multiple gene patterns. The model has been extended to heuristically include ILS (Marcussen et al 2012), as well as dates for the internal nodes (Marcussen et al 2015). Jones et al (2013) have also developed models that incorporate ILS in a bayesian context, but only for the case of a single hybridization event between two diploid species (an allotetraploid).

Species-tree inference for a pair of gene phylogenies that may be networks not trees, has been considered in terms of parsimony by Drinkwater & Charleston (2014).

This brings us to the matter of introgression. The massive recent influx of genome-scale data for hominids has lead to the development of methods explicitly for the analysis of what is termed admixture among the lineages. These methods basically work by constructing a phylogenetic tree that includes admixture events, the topology inference being based on allele frequencies. There has been no formal comparison of the methods, and not much application to non-humans. Three such methods have been produced so far (Patterson et al 2012; Pickrell & Pritchard 2012; Lipson et al 2013).

Recombination has somewhat been the poor cousin to other causes of reticulation, as most network methods assume it to be absent. Nevertheless, Gusfield (2014) has recently provided an ample survey of the study methods available to date.

References

Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology & Evolution 24: 332-340.

Drinkwater B, Charleston MA (2014) An improved node mapping algorithm for the cophylogeny reconstruction problem. Coevolution 2: 1-17.

Gusfield D (2014) ReCombinatorics: the Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks. MIT Press, Cambridge.

Holder MT, Anderson JA, Holloway AK (2001) Difficulties in detecting hybridization. Systematic Biology 50: 978-982.

Huber KT, Oxelman B, Lott M, Moulton V (2006) Reconstructing the evolutionary history of polyploids from multilabeled trees. Molecular Biology & Evolution 23: 1784-1791.

Huson D, Rupp R, Scornavacca C (2010) Phylogenetic Networks: Concepts, Algorithms, and Applications. Cambridge University Press, Cambridge.

Huson DH, Scornavacca C (2011) A survey of combinatorial methods for phylogenetic networks. Genome Biology & Evolution 3: 23-35.

Jones G, Sagitov S, Oxelman B (2013) Statistical inference of allopolyploid species networks in the presence of incomplete lineage sorting. Systematic Biology 62: 467-478.

Lipson M, Loh P-R, Levin A, Reich D, Patterson N, Berger B (2013) Efficient moment-based inference of population admixture parameters and sources of gene flow. Molecular Biology & Evolution 30: 1788-1802.

Marcussen T, Heier L, Brysting AK, Oxelman B, Jakobsen KS (2015) From gene trees to a dated allopolyploid network: insights from the angiosperm genus Viola (Violaceae). Systematic Biology 64: 84-101.

Marcussen T, Jakobsen KS, Danihelka J, Ballard HE, Blaxland K, Brysting AK, Oxelman B (2012) Inferring species networks from gene trees in high-polyploid north American and Hawaiian violets (Viola, Violaceae). Systematic Biology 61: 107-126.

Mindell DP (2013) The Tree of Life: metaphor, model, and heuristic device. Systematic Biology 62: 479-489.

Morrison DA (2014a) Phylogenetic networks: a review of methods to display evolutionary history. Annual Research and Review in Biology 4: 1518-1543.

Morrison DA (2014b) Is the Tree of Life the best metaphor, model or heuristic for phylogenetics? Systematic Biology 63: 628-638.

Morrison DA (2015, in press) Pattern recognition in phylogenetics: trees and networks. In: Elloumi M, Iliopoulos CS, Wang JTL, Zomaya AY (eds) Pattern Recognition in Computational Molecular Biology: Techniques and Approaches. Wiley, New York.

Nakhleh L (2013) Computational approaches to species phylogeny inference and gene tree reconciliation. Trends in Ecology & Evolution 28: 719-728.

Patterson NJ, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D (2012) Ancient admixture in human history. Genetics 192: 1065-1093.

Pickrell JK, Pritchard JK (2012) Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genetics 8: e1002967.

Szöllősi GJ, Tannier E, Daubin V, Boussau B (2015) The inference of gene trees with species trees. Systematic Biology 64: e42-e62.

Yu Y, Barnett RM, Nakhleh L (2013a) Parsimonious inference of hybridization in the presence of incomplete lineage sorting. Systematic Biology 62: 738-751.

Yu Y, Degnan JH, Nakhleh L (2012) The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genetics 8: e1002660.

Yu Y, Dong J, Liu KJ, Nakhleh L (2014) Maximum likelihood inference of reticulate evolutionary histories. Proceedings of the National Academy of Sciences of the USA 111: 16448-16453.

Yu Y, Ristic N, Nakhleh L (2013b) Fast algorithms and heuristics for phylogenomics under ILS and hybridization. BMC Bioinformatics 14: S6.

Yu Y, Than C, Degnan JH, Nakhleh L (2011) Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Systematic Biology 60: 138-149.

Wednesday, December 10, 2014

Circular phylograms for phylogenetic networks


Phylogenetic trees have been drawn in many formats, including what are known as vertical, horizontal, multidirectional, radial, hyperbolic (restricted to interactive trees) and figurative (ie. looking like an actual tree). Radial, or circular, trees are used when there are many taxa — the root is placed at the centre, and the increasing length of the circumference is used to display the increasing number of nodes. An example is shown in the earlier blog post Why do we still use trees for the dog genealogy?

Here, I point out that the radial format also makes it much easier to display reticulations in an evolutionary network. My example comes from The Nam Family: a Study in Cacogenics (Arthur H. Estabrook and Charles B. Davenport. 1912. Eugenics Record Office Memoir No. 2. Cold Spring Harbor, NY). This book involves, among other things, a pedigree study of an extended family in New York state, with a large amount of inbreeding. Two large pedigrees are presented, representing the genealogies of two different parts of the extended family in a place called "Nam Hollow".


One of these pedigrees is drawn in the vertical format, with the earliest generations at the top. The other pedigree is drawn in the radial format, with the earliest generations in the centre.


The difference in choice of format seems to be a result of the fact that in the second case there is extensive reticulation within the earlier generations, and this is obviously much easier to display in the centre of a circle, with increasing circumference for the large number of descendants. Nevertheless, the first pedigree would also be easier to read in the radial format. It is surprising that this format is not used more often.

Eugenics

The study under discussion was one of several projects that arose from the eugenics movement in the USA. The reports include Hill Folk: Report on a Rural Community of Hereditary Defectives (Davenport. 1912), The Kallikak Family: a Study in the Heredity of Feeblemindedness (Henry Herbert Goddard. 1912), and The Jukes (Estabrook. 1916). Eugenics arose in the wake of research on Mendelian inheritance, applying it to the study of human societies. This was thus the initial phase of what we now call the study of human genetics, and large amounts of detailed data were collected in many parts of the world.

Unfortunately, the researchers greatly over-estimated the role of genetics in human behavior, attributing many of the by-products of poverty to "constitutional" characteristics. In particular, many of what we now consider to be environmental aspects of poverty were attributed to inbreeding (which is another feature common in poor communities). This is in contrast to previous studies of the same US families, such as that of Richard L. Dugdale (1874-1877. The Jukes: a Study in Crime, Pauperism, Disease and Heredity), which placed more emphasis on the environment as a factor in criminality, disease and poverty.

So, the eugenics researchers tended to collect data that we would now consider to be seriously biased, where the observations are inextricably confounded with interpretations. For example:
V-166 [person #166 in generation V] is a temperate, sociable, and licentious man, who married his cousin, V-183, a Nam-like, stolid shy, reticent, suspicious harlot. They had eight children ... All have the characteristic slowness in movement, and indolence and lack of ambition of the Nams. They vary little except that some are more reticent and shy than others, and there is some licentiousness. All are illiterate, and probably without the capacity for learning from books. VI-257, who is especially careless, disorderly, and shy, had an illegitimate son, who died of infantile diarrhea. Here again we see the uniformity resulting from inbreeding.
What was worse, the eugenics movement did not stop at mere scientific enquiry. They indulged, with governmental support, in what they politely called "social prophylaxis". For example:
Although our primary aim is the present the bare facts [!] we cannot altogether neglect the natural inquiry as to the proper treatment of such condition as we have described. Various possible modes of treatment will be considered.
First there is the method of laissez faire. The Nam community takes care of itself to a large extent; why do anything? Unfortunately, the community is not wholly isolated. From it families have gone to Minnesota and other points in the West and there formed new centers of degeneration. Harlots go forth from here and become prostitutes in our cities. The tendency to larceny, burglary, arson, assault, and murder have gone, with the wandering bodies in which they are incorporated, throughout the State and to great cities like New York. Nam Hollow is a social pest spot whose virus cannot be confined to its own limits. No state can afford to neglect such a breeding center of feeble-mindedness, alcoholism, sex-immorality, and infanticide as we have here. A rotten apple can infect the whole barrel of fruit. Unless we abandon the ideal of social progress throughout the State we must attempt an improvement here.
The authors seem to be almost foaming at the mouth by the end of their spiel. Option two, "improving the conditions of the persons in the Hollow" is dismissed as "supplying a veneer of good manners to a punky social body." Option three, "scattering the people" is seen as "fraught with danger". Nevertheless, this was the option preferred by the British government in the late 1700s and early 1800s, when they founded penal colonies in Australia for crimes like "stealing five cheeses". The assumption that poverty is hereditary certainly has a long history, and a wide geographical spread.

Option four, preventing the people from breeding, by isolating them, is the recommended one. The final note is: "Of course, asexualization would produce the same result; but it is doubtful if public sentiment would favor such treatment, quite within the province of the State though it be." We now know this to be a very naive conclusion. By the 1930s many western countries had active compulsory sterilization programs (see Wikipedia); and many still do, including states of the USA.

However, eugenics did have positive outcomes, among the obvious negative ones. For example, the first demonstration of simple Mendelian inheritance of a human medical condition concerned Unverricht-Lundborg disease, a form of epilepsy. This was first reported in 1891 by Heinrich Unverricht, in Estonia. However, it was Herman Lundborg, a Swedish physician, who first identified its genetic component (1903. Die progressive Myoclonus-Epilepsie (Unverricht’s Myoclonie). Almqvist and Wiksell, Uppsala).

He traced the ancestry of 17 affected people in one family from southern Sweden, showing that they were all descended from the same ancestors. The pedigree showed the pattern of disease occurrence expected from Mendelian inheritance of a single recessive locus. This study was facilitated by frequent inbreeding within the family (20% of households had first-cousin parents), which Lundborg referred to as "unwise marriages". We now know that the disease results from a mutation in the CCC-CGC-CCC-GCG repeat region of the cystatin B gene — unaffected people have 3-4 repeats while affected people have 40+ repeats.

Lunborg himself was an active member of the eugenics movement in Sweden (which was referred to as 'race biology'), and most of his writings about the epileptic family were as bad as those quoted above (their "degeneration" was attributed to the fact that "they distilled their own alcohol, and thus became drunkards"). He eventually became Professor for Racial Hygiene; and he was influential in the implementation of forced sterilization programs in Sweden, believing that "The future belongs to the racially fine people", which obviously included himself.

Wednesday, November 26, 2014

An outline history of phylogenetic trees and networks


This the 300th post on this blog, and so I thought we might have a bit of a summary. Here is the early history of phylogenetic trees and networks as we currently know it. There may, of course, be as yet undetected sources. Details of each of these historical notes (including illustrations) can be found elsewhere in this blog — you can use the search feature in the right side-bar to find them.

Biology

Genealogies as pedigrees (the history of individuals) have a long history. For example, they appear in inscriptions concerning the pharaohs of Ancient Egypt, although these are very imprecise and have caused many headaches for modern scholars. They appear as chains of ancestors and descendants in the Old Testament of the Christian Bible, often contradicting each other and claiming impossible lifespans. Most importantly for modern usage, they were employed in the New Testament to legitimize Jesus as the messiah foretold in the Old Testament. The first known illustration of this appeared in c.400 AD, and it was actually a network, as there were two lineages leading to Jesus (via both Joseph and Mary).

The apparent success of this application (later called the Tree of Jesse, pictures of which started appearing in the 10th century) has meant that both royalty and the nobility have subsequently used pedigrees to assert their own right to be regal and noble. The first known illustration of this is from c.1000 AD, in which Cunigunde of Luxembourg's ancestry was traced in a tree-like manner to include Charlemagne, thus legitimizing her claim to being royal.

Also, up until 1215 AD marriage within seven degrees of separation was not allowed by the christian church, and intestate inheritance applied the same relationship limit. So, a record of blood ties among relatives was often needed; and these started appearing in family bibles, for example. The first recorded tree-like illustrated pedigree was for Lambert of Saint-Omer, which appeared in 1122 AD in his personal copy of his book Liber Floridus.

It seems obvious, then, to also construct genealogies for groups of organisms, which we now call phylogenies (a word coined by Ernst Haeckel in 1866). The Great Chain of Being was for a long time the most popular iconography for relationships, mainly because it neatly tied in with the Christian philosophy of a chain of intellectual ideas, leading from pragmatic earthly concerns and culminating in the idealistic heavens. Humans were, of course, at the head of the chain of earthly beings, and capable of ascending to the heavens.

However, this did not work from a purely observational point of view. Observed pedigrees were not linear, but branched with each generation and often fused again via marriage. Furthermore, biodiversity (the patterns among groups of organisms) also seemed to have multiple relationships. This lead Vitaliano Donati in 1750 (Della Storia Naturale Marina dell' Adriatico) to suggest that:
In addition, the links of the chain are joined in such a way within the links of another chain, that the natural progressions should have to be compared more to a net than to a chain, that net being, so to speak, woven with various threads which show, between them, changing communications, connections, and unions. [from the original Italian]
He was not alone in this thought, although others chose different metaphors. For example, Carl von Linné in 1751 (Philosophia Botanica) wrote this:
All plants show affinities on either side, like territories in a geographical map. [from the original Latin]
Neither author published a reticulating diagram to illustrate their thoughts, although one of Linné's students subsequently produced a version of his ideas in 1792 (Caroli a Linné, Praelectiones in Ordines Naturales Plantarum).

So, it was Georges-Louis Leclerc, Comte de Buffon, who produced the first empirical phylogeny in 1755 (Histoire Naturelle Générale et Particulière, Tome V). This was a network showing the evolutionary origin of domesticated dog breeds. This was followed by Antoine Nicolas Duchesne in 1766 (Histoire Naturelle des Fraisiers), who produced a network showing the evolutionary origin of strawberry cultivars. In both cases the evolutionary process illustrated by the reticulations in the network was hybridization. Note that both of these diagrams refer to within-species genealogies, rather than to relationships between species; and neither author seems to have contemplated the idea of among-species phylogenies.

Thus, in both theory and practice modern phylogenetic metaphors started as networks, not trees. It was Peter Simon Pallas in 1776 (Elenchus Zoophytorum) who first suggested using a tree as a simplified metaphor:
As Donati has already judiciously observed, the works of Nature are not connected in series in a Scale, but cohere in a Net. On the other hand, the whole system of organic bodies may be well represented by the likeness of a tree that immediately from the root divides both the simplest plants and animals, [but they remain] variously contiguous as they advance up the trunk, Animals and Vegetables; [from the origina Latin]
Again, no diagram was forthcoming to illustrate this. It was Jean-Baptiste Pierre Antoine de Monet, Chevalier de Lamarck, who finally produced an empirical phylogeny in 1809 (Philosophie Zoologique). This was a small tree showing the evolutionary relationships among the major groups of animals. However, it represented what we would now call transformational evolution, as Lamarck did not believe in extinction, and thus he showed one group transforming into another. This differed from both Buffon and Duchesne, who were illustrating a process of increasing diversity of groups. It also differed by referring to supra-species relationships.

For the next 50 years, diagrams showing biodiversity relationships illustrated what we now call patterns of affinity, rather than showing historical relationships. These affinity diagrams showed apparent similarities among groups of organisms, without any implication that the relationships were the result of evolutionary history. The majority of these diagrams were networks rather than trees, indicating that groups of organisms had observed similarities with several other groups.

It is Charles Darwin and Alfred Russel Wallace who are credited with introducing, in 1858, the idea that natural selection could be the important process by which new species arise, although the idea of natural selection itself had been "in the air" for more than half a century with respect to within-species variation. (In the case of Patrick Matthew, he had also suggested a role in the origin of new species; 1831, On Naval Timber and Arboriculture; with Critical Notes on Authors who have Recently Treated the Subject of Planting).

As was by now becoming a tradition, neither Darwin nor Wallace (nor Matthew) produced a diagram to illustrate their thoughts. Darwin did draw a theoretical diagram in his subsequent 1859 book (On the Origin of Species by Means of Natural Selection), but he used it to illustrate continuity of evolutionary descent and the processes of extinction and diversification, rather than strictly as representing a phylogeny. His famous "Tree of Life" metaphor had nothing to do with the diagram (it was a Biblical metaphor, to stimulate the imagination of his readers).

The first person to get into print what we could call an empirical diagram representing Darwin's idea was Johann Friedrich Theodor Müller in 1864 (Für Darwin), who drew a small (three-species) tree of amphipods. This was followed by St George Jackson Mivart in 1865 (Contributions towards a more complete knowledge of the axial skeleton in the primates. Proceedings of the Zoological Society of London 33: 545-592). This was a much more extensive diagram illustrating possible evolutionary relationships among primate species (including humans) based solely on their body skeleton.

Confusion between trees and networks reappeared at this time. In particular, Franz Martin Hilgendorf had produced an unpublished PhD thesis in 1863 (Beiträge zur Kenntniß des Süßwasserkalkes von Steinheim) during which he constructed an empirical network of relationships among extinct snail species; but he rejected this because it did not match the Darwinian idea of an evolutionary tree. He later collected more data, and instead published a phylogenetic tree in 1866 (Planorbis multiformis im Steinheimer Süßwasserkalk: ein beispiel von gestaltveränderung im laufe der zeit).

Thus, we last saw an explicit evolutionary network in 1766, referring to with-species variation. The first person to publish an evolutionary network showing relationships among species was apparently Ferdinand Albin Pax in 1888 (Monographische übersicht über die arten der gattung Primula. Botanische Jahrbücher für Systematik, Pflanzengeschichte und Pflanzengeographie 10: 75-241). He produced 14 networks of various primula species, apparently showing affinity relationships, but three of these also illustrate hybridization, which is strictly an evolutionary process.

Anthropology

Genealogies appear in anthropology as well as in biology. Any human creation can be considered to have a history of "descent with modification" if copies are passed from generation to generation (eg. languages, books, tales). For our purposes here, the most important historical developments were in linguistics (languages studies) and in stemmatology (manuscript studies).

Georg Stiernhielm appears to have been the first linguist to draw a genealogy, when he produced a small network of Germanic languages in 1671 (De Linguarum Origine Præfatio, the preface to his edition of Evangelia ab Ulfila Gothorum). This was followed by Félix Gallet in c.1800 (Arbre Généalogique des Langues Mortes et Vivantes), who produced a single broadsheet with a network of Indo-European languages.

Note that, as for biology, the modern metaphors started as networks, not trees. More importantly, note that Stiernhielm's diagram pre-dated Buffon's dog network by more than 80 years — evolutionary ideas were less revolutionary in linguistics than they were in biology.

Darwin explicitly noted a connection between language genealogies and biology genealogies in 1859. However, the first people to get into print what we could call empirical diagrams representing Darwin's idea did so before Darwin published anything on the subject. In 1853 František Ladislav Čelakovský published a tree depicting a history of the Slavic languages (Čtení o Srovnávací Mluvnici Slovanské na Universitě Pražskě), and Auguste Schleicher published one on the development of the Indo-Germanic language family (Die ersten Spaltungen des Indogermanischen Urvolkes. Allgemeine Monatsschrift für Wissenschaft und Literatur 1853: 786-787).

Stemmatology differs from linguistics and biology in first producing a tree rather than a network. Hans Samuel Collin and Carl Johan Schlyter produced this in 1827 (first volume of Corpus Iuris Sueo-Gotorum Antiqui), with a tree of relationships among hand-written copies of documents containing the Medieval laws of Sweden. This was also a tree that represented Darwin's genealogical idea, and so it may be considered to be the first one of that type to be published (ie. 25 years before Čelakovský and Schleicher, and 30 years before Darwin).

This early lead was followed by the first network in 1832, when Friedrich Wilhelm Ritschl's stemma of a book by Thomas Magister (Thomae Magistri sive Theoduli Monachi Ecloga vocum Atticarum) explicitly showed sources of contamination among the manuscript copies — that is, different parts of a manuscript were copied from different sources, rather strict ancestor-descendant copying.

Interestingly, the tree metaphor didn’t endure in anthropology as well as it did in biology. It was quickly replaced by alternative metaphors, such as wave, web, warp & weft, lattice and other continuously reticulating images. Horizontal flow of information has always been seen as a dominant force in anthropological histories.

Timeline

Networks

1671 Georg Stiernhielm — small language network
1750 Vitaliano Donati — biology network suggestion
1751 Carl von Linné — biology map suggestion
1755 Georges-Louis Leclerc, Comte de Buffon — intra-species network
1766 Antoine Nicolas Duchesne — intra-species network
1792 Carl von Linné — map
1800 Félix Gallet — language network
1832 Friedrich Wilhelm Ritschl — small manuscript network
1863 Franz Martin Hilgendorf — unpublished inter-species network
1888 Ferdinand Albin Pax — inter-species network

Trees

1776 Peter Simon Pallas — biology tree suggestion
1809 Jean-Baptiste Pierre Antoine de Monet, Chevalier de Lamarck — small inter-species tree
1827 Hans Samuel Collin and Carl Johan Schlyter — manuscript tree
1853 František Ladislav Čelakovský — language tree
1853 August Schleicher — language tree
1859 Charles Robert Darwin — generalized tree
1864 Johann Friedrich Theodor Müller — small inter-species tree
1865 St George Jackson Mivart — large inter-species tree
1866 Franz Martin Hilgendorf — large inter-species tree

Wednesday, October 1, 2014

A fundamental limitation of pedigrees and networks but not trees


It would be nice to think that genealogical history can be reconstructed with ease. However, this is known not to be so. In particular, being able to reconstruct an overall history from a collection of sub-histories, which can thought of as the "building blocks", is not necessarily guaranteed.

That is, even given a complete collection of all of the sub-histories it is not necessarily possible to reconstruct a unique overall history. In other words, there can be pairs of graphs that do not represent the same evolutionary histories, but still display exactly the same collection of building blocks. ("Display" means roughly that a building block can be obtained by simply deleting some of the edges and vertices in the graph.) Mathematically, the sub-histories do not determine (or encode) the history.


For example, it is known that pedigrees cannot necessarily be reconstructed from a collection of all of the sub-pedigrees (Thatte 2008). Pedigrees are the traditional "family trees" showing the ancestry of individuals. Pedigrees differ from phylogenies in that all of the individuals have two parents (rather than possibly having a single immediate ancestor) and there are probably multiple roots (unless there is considerable inbreeding).

Phylogenetic trees, on the other hand can be uniquely reconstructed from a collection of all of the possible sub-trees (see Dress et al. 2012). This is one of the things that makes trees valuable as a phylogenetic model — it is theoretically possible to collect enough information to construct a unique phylogenetic tree.

Rooted phylogenetic networks do not, however, share this property. For some time it has been known that networks cannot necessarily be built from their building blocks, whether those blocks are rooted trees (Willson 2011) or triplets (= rooted 3-taxon trees) or clusters (= rooted sub-trees = clades) (Gambette and Huber 2012).

This is illustrated in the next figure (adapted from Huber et al.), which shows two networks at the top and below that the four trees that are displayed by both of them (by deleting one of each pair of incoming edges at the two reticulation nodes). Given these four trees we cannot reconstruct a unique network, and yet they are the only four trees associated with either network.


To make matters worse, Huber et al. (in press) have now revealed that we can't reconstruct rooted phylogenetic networks even from sub-networks. To do this they show that networks cannot necessarily be built from trinets (= rooted 3-taxon networks). Certain types of networks (e.g. level-1, level-2, tree-child) can be reconstructed (van Iersel and Moulton 2014), but Huber et al. show the example in the second figure, which shows two networks at the top and below that the four trinets that are displayed by both of them. Given these four trinets we cannot reconstruct a unique network, and yet they are the only four trinets associated with either network.


This means that "even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history." The best analogy here is Humpty Dumpty — even given all of the pieces, we literally might not be able to put him back together again. We could if he is a rooted tree, but we cannot guarantee it if he is a rooted network or pedigree.

This may not matter in practice, given that we don't yet know the circumstances under which it is possible to uniquely reconstruct networks, but it does mean that we acquire a certain degree of uncertainty as we move from "tree thinking" to "network thinking".

References

Dress A, Huber KT, Koolen J, Moulton V, Spillner A (2012) Basic Phylogenetic Combinatorics. Cambridge Uni Press.

Gambette P, Huber K (2012) On encodings of phylogenetic networks of bounded level. Journal of Mathematical Biology 65: 157-180.

Huber KT, van Iersel L, Moulton V, Wu T (in press) How much information is needed to infer reticulate evolutionary histories? Systematic Biology

van Iersel L, Moulton V (2014) Trinets encode tree-child and level-2 phylogenetic networks. Journal of Mathematical Biology 68: 1707-1729.

Thatte BD (2008) Combinatorics of pedigrees i: counterexamples to a reconstruction problem. SIAM Journal of Discrete Mathematics 22: 961-970.

Willson SJ (2011) Regular networks can be uniquely constructed from their trees. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8: 785-796.

Monday, September 15, 2014

Guitars and networks


I have noted before that the evolutionary history of musical instruments is likely to be a reticulating network rather than being tree-like (Cornets: from a tree to a network). As another illustration of the pattern, we can consider the evolution over the past few centuries of the Spanish or flamenco guitar (taken from the Origem do nome Violão blog post).


This genealogy (with time proceeding from left to right) shows three basic characteristics that seem to be common in anthropological histories. First, there are multiple roots — in this case, three different instruments from the 16th century have provided input into the modern acoustic guitar. Second, there is an early history of reticulation, with ideas for new instrumentation being taken freely from among the existing instruments, in this case presumably in the search for better sound reproduction. Third, there is simple transformational evolution, with new models replacing the previous ones in popularity — for example, over the past 100 years the Spanish guitar has simply gotten larger (this is Cope's Rule.)

Wednesday, July 23, 2014

Evolutionary fitness and incest


I have written before about the expected genetic problems associated with inbreeding, including consanguinity and incest (relationships between people who are first cousins or closer). Conventionally, the evolutionary advantage of sexual over non-sexual reproduction is considered to be the creation of genetic diversity through heterozygosity. Inbreeding, by reducing heterozygosity, then seems to negate the advantages of sexual reproduction — it leads to the propagation of deleterious recessive alleles and thus inbreeding depression. So, there is a clear evolutionary dimension to the fact that incest avoidance is nearly universal in humans.

The best known exceptions to this situation are among royalty, including the family "trees" of the ancient Egyptian 18th Dynasty (see Tutankhamun and extreme consanguinity) and the Egyptian Ptolemaic dynasty (see Cleopatra, ambition and family networks), which were hybridization networks rather than conventional trees. The presence of consanguinity and incest among royal families then requires a biological explanation. As noted by van den Berghe & Mesher (1980):
Royal incest is best explained in terms of the general sociobiological paradigm of inclusive fitness ... Royal incest (mostly brother-sister; less commonly father-daughter) represents the logical extreme of hypergyny. Women in stratified societies maximize fitness by marrying up; the higher the status of a woman, the narrower her range of prospective husbands. This leads to a direct association between high status and inbreeding.
The benefits of inclusive fitness refer to the increased number of offspring in future generations that result from increasing the reproductive success of close relatives. This is achieved via choice of mate. In other words, close relatives share genes, and the success of any relative in leaving offspring is a success for all relatives. Therefore, evolutionary fitness is a combination of individual fitness plus the fitness of close relatives. Inbreeding may reduce individual fitness but can increase inclusive fitness, as noted by Puurtinen (2011):
Theoretical work has shown that inclusive fitness benefits can favor close inbreeding even when this results in substantial reduction in offspring fitness. These models have identified the boundary level of inbreeding depression limiting the evolution of inbreeding among first-order relatives, that is, between full siblings, or between parents and offspring.
So, there is a stable level of inbreeding in those populations that practice mate choice for optimal inbreeding. For example, the genetic risks of close inbreeding can be more than accounted for by the production of a highly related heir who has access to a wide choice of mates. Nevertheless:
For a wide range of realistic inbreeding depression strengths, mating with intermediately related individuals maximizes inclusive fitness.
In other words, mating with very close relatives is unlikely to evolve via natural selection because it is not an optimal strategy; and we must thus look to a sociological component to incest (such as retaining wealth within the family), as well as a biological one.


In this context, it is interesting to note exceptions to the usual restriction of incest to the aristocracy. The society of Graeco-Roman Egypt (from c. 300 BCE to 300 CE) provides the best-documented case (eg. see Hopkins 1980; Shaw 1992; Parker 1996; Scheidel 1997; Huebner 2007; Remijsen & Clarysse 2008). [This era starts with the Ptolemaic dynasty, which marks the collapse of Egyptian rule of Egypt.] During this time a significant proportion of all marriages noted in official Roman census declarations were between full brothers and sisters. That is, the Roman-era Egyptians did not limit this type of inbreeding to any small group, but spread it across several social classes (mainly Greek settlers rather than native Egyptians).

As noted by Schiedel (1997):
According to official census returns from Roman Egypt (first to third centuries CE) preserved on papyrus, 23·5% of all documented marriages in the Arsinoites district in the Fayum (n=102) were between brothers and sisters. In the second century CE, the rates were 37% in the city of Arsinoe and 18·9% in the surrounding villages. Documented pedigrees suggest a minimum mean level of inbreeding equivalent to a coefficient of inbreeding of 0·0975 in second century CE Arsinoe. Undocumented sources of inbreeding and an estimate based on the frequency of close-kin unions indicate a mean coefficient of inbreeding of F=0·15-0·20 in Arsinoe and of F=0·10-0·15 in the villages at the end of the second century CE. These values are several times as high as any other documented levels of inbreeding.
For comparison, the inbreeding F values for these family relationships are:
self
parent-offspring = siblings
uncle-niece = double first cousins
first cousins
first cousins once removed
second cousins
0.500
0.250
0.125
0.063
0.031
0.016

However, inbreeding depression seems not to have been a notable problem during this historical time. As noted by John Hawkes:
There is not a single mention in the evidence that links sibling marriage to negative genetic effects or unhappy marriages.
This does not mean that there were no problems, but merely that any problems were not documented, as noted by Scheidel (1997):
Even in the absence of explicit references to inbreeding depression from Roman Egypt, there is no compelling reason to assume that brother–sister marriage could have remained entirely without negative consequences for the Arsinoites. It is however possible that, due to a low incidence of lethal recessives, such effects were considerably weaker than in some western samples. The census returns do not suggest lower levels of fertility or smaller numbers of children among sibling couples ...
The practice seems to have stopped solely because it was contrary to Roman Law:
Before a.d. 212 the Romans had accepted discrepancies between their own legal practice and prevailing local customs and traditions in the Eastern provinces. Papyri from Roman Egypt, the Talmud, and the Romano-Syrian law book indeed reveal legal procedures which differed significantly from Roman law in matters such as marriage, guardianship, paternal authority, sales, and debts. The Constitutio Antoniana, however, made all free men and women of the Roman Empire into Roman citizens, and so Roman law became applicable to all inhabitants of Egypt. Brother-sister marriages cease to be documented in our Roman census returns from the early third century on. Our last [incest] testimony dates to a.d. 229.

References

Hopkins K (1980) Brother-sister marriage in Roman Egypt. Comparative Studies in Society and History 22: 303-354.

Huebner SR (2007) "Brother-sister" marriage in Roman Egypt: a curiosity of humankind or a widespread family strategy? Journal of Roman Studies 97: 21-49.

Parker S (1996) Full brother-sister marriage in Roman Egypt: Another look. Cultural Anthropology 11: 362-376.

Puurtinen M (2011) Mate choice for optimal (k)inbreeding. Evolution 65: 1501-1505.

Remijsen S, Clarysse W (2008) Incest or adoption? Brother-sister marriage in Roman Egypt revisited. Journal of Roman Studies 98: 53-61.

Scheidel W (1997) Brother-sister marriage in Roman Egypt. Journal of Biosocial Science 29: 361-371.

Shaw BD (1992) Explaining incest: brother-sister marriage in Graeco-Roman Egypt. Man 27: 267-299.

Monday, May 19, 2014

Cleopatra, ambition and family networks


There is an old saying in English that "Behind every great man there is a woman ... telling him to be great". This is intended to indicate that even in patrilineal societies women have influenced history, even if history has chosen not to formally recognize them (or historians have, anyway). However, every so often a woman has also stepped into the spotlight for herself, and recognizably influenced events in a way that has brought her name down through history.

The most famous of these is probably Cleopatra (or more properly Kleopatra), the last ruler of Ancient Egypt (as Cleopatra VII). Sadly, her ambition to become Empress of the known world seems to have destroyed two successive Roman rulers (Julius Caesar and Marc Antony) as well as her own two brothers (who would have ruled in her place); and her failure seems to have lost the country of which she was queen, so that Egypt became a Roman dependency. She ruled from 51-30 BCE, and modern Egypt did not regain its independence until 1953. This was one seriously influential woman.

As noted by Schiff (2010):
She lost her kingdom once; regained it; nearly lost it again; amassed an empire; lost it all. At the height of her power she controlled virtually the entire eastern Mediterranean coast, the last great kingdom of any Egyptian ruler. For a fleeting moment she held the fate of the Western world in her hands ... Catastrophe reliably cements a reputation, and Cleopatra's end was sudden and sensational.
Her interest to us, however, is her role in a dynasty that favored incest, and thus had a "family tree" that was a hybridization network, as shown in the figure. This particular family history is rather complex. Note that Cleopatra herself had at least four liaisons, two with her brothers (who successively ruled jointly with her as Ptolemy XIII and Ptolemy XIV, respectively) and two with Romans (Julius Caesar and Marc Anthony). Later, she also ruled jointly with her son by Julius Caesar (as Ptolemy XV).


Adapted from the Too Much Information blog, based on the information at Ian Mladjov's Genealogical Tables

The Ptolemaic dynasty was founded after the death of Alexander the Great (aka Alexander III of Macedon), when his empire was divided up among his Greek generals, and in 323 BCE Egypt ended up in the hands of Ptolemy, who subsequently ruled as the pharaoh Ptolemy I from 305-282 BCE. As Dray (2012) has noted:
His daughter, Arsinoe II, would start the tradition of incest. Married off to an old King of Thrace when she was still a teenager, she was the ultimate survivor. Her life was frequently in danger and she made many narrow escapes ... At some point, Arsinoe seems to have decided that if she wanted to be safe, she couldn’t trust anyone outside her immediate family. So, she returned to Egypt and married her full brother, Ptolemy II.
Now, the Greeks didn’t have a tradition of incest in their ruling families … but the pharaohs of Egypt did. By marrying her brother, Arsinoe was able to help create a link between the new Ptolemaic dynasty and the very old traditions of the native Egyptians. It served her extremely well as she became the first female pharaoh of the Ptolemaic dynasty, ruling not just as the wife of the king, but as a king in her own right.
Meeg (2009) suggests that:
According to tradition, incestuous marriages between the pharaohs and their sisters were common. If this was the case, it could have been done to emulate the god Osiris and his sister / wife the goddess Isis (the product of that union was Horus, the alleged ancestor of the Pharaoh), and/or to keep the sacred bloodline pure. When Alexander the Great's general Ptolemy seized control of Egypt around 323 BC, his descendants would continue the local custom of pharaonic brother-sister marriages. This practice was unknown among Greeks and Macedonians.
Indeed, Wikipedia notes:
In ancient Egypt, royal women carried the bloodlines and so it was advantageous for a pharaoh to marry his sister or half-sister; in such cases a special combination between endogamy and polygamy is found. Normally the old ruler's eldest son and daughter (who could be either siblings or half-siblings) became the new rulers. All rulers of the Ptolemaic dynasty from Ptolemy II were married to their brothers and sisters, so as to keep the Ptolemaic blood "pure" and to strengthen the line of succession. Cleopatra VII (also called Cleopatra VI) and Ptolemy XIII, who married and became co-rulers of ancient Egypt following their father's death, are the most widely known example.
Bevan (1927) continues the story [Note: he uses one number less for the Cleopatras and Ptolemies]:
Cleopatra VI found herself queen of Egypt at the age of seventeen or eighteen. By the custom of the house, and according to the will and testament of Ptolemy Auletes, the elder of her two brothers, then only nine or ten, was associated with her, as king (Ptolemy XII). They probably had, as a pair, the style of "Father-loving Gods" (Theoi Philopatores), though neither during the reign of Cleopatra with Ptolemy XII, nor during her reign, later on, with the younger brother, Ptolemy XIII (then about twelve), do the coins bear any head or name but that of the queen, and in Egyptian sepulchral inscriptions put up during the reign of Cleopatra with her younger brother (regnal years 5, 6, and 7 of Cleopatra) the regnal year of the boy-king is ignored. Ptolemy XIV was the acknowledged son of Julius Caesar and Cleopatra, and ruled as child king with his mother.
The involvement of royalty in consanguinity and incest is widespread. As noted by Dobbs (2010):
While virtually every culture in recorded history has held sibling or parent-child couplings taboo, royalty have been exempted in many societies, including ancient Egypt, Inca Peru, and, at times, Central Africa, Mexico, and Thailand [and also Hawaii].
I have already discussed incest in the family "trees" of the Egyptian 18th Dynasty, in Tutankhamun and extreme consanguinity (the other set of pharaohs where this was common); and I have covered the persistent inbreeding in the downfall of the modern Spanish branch of the Habsburgs, in Family trees, pedigrees and hybridization networks.

Not unexpectedly, this phenomenon has received attention from modern evolutionary biologists. Conventionally, the evolutionary advantage of sexual over non-sexual reproduction is considered to be the creation of genetic diversity through heterozygosity. Inbreeding, by reducing heterozygosity, then seems to negate the advantages of sexual reproduction. So, the near universality of incest avoidance in humans has a clear genetic dimension. Indeed, as I have noted in previous blog posts this is easily demonstrated in well-known families — (i) Charles Darwin's family pedigree network, (ii) Toulouse-Lautrec: family trees and networks.

The presence of incest among royal families then requires biological explanation. Indeed, van den Berghe & Mesher (1980) have provided one:
Royal incest (mostly brother-sister; less commonly father-daughter) represents the logical extreme of hypergyny. Women in stratified societies maximize [evolutionary] fitness by marrying up; the higher the status of a woman, the narrower her range of prospective husbands. This leads to a direct association between high status and inbreeding. Royal incest is a fitness maximizing strategy if the following conditions are met: polygyny, patrilineal succession, and parental control of royal succession. Under those conditions, the genetic risks of close inbreeding are more than accounted for by the production of a highly related male heir who has, himself, access to a large harem. Data from Ancient Egypt, Inca Peru, Hawaii, Thailand, Monomotapa, Bunyoro, Ankole, Buganda, Shilluk, Zande, Nyanga and Dahomey confirm hypotheses derived from the sociobiological paradigm of inclusive fitness.
Finally, to return to Cleopatra, she is usually credited with being fatally attractive due to her great beauty. However, there is no evidence that this was actually the case. Her attractiveness to men seems to have come much more from a strong personality, including determined diplomacy and an easy facility with languages. Also, her ancestors were Macedonian Greeks, rather than native Egyptians, giving her a stronger genetic and cultural tie to Europe rather than to Africa, which must have helped when trying to woo the rulers of the Roman Empire. It was this ancestry that the dynasty's consanguinity and incest were intended to protect. The Egyptian populace certainly didn't benefit from it.

Indeed, Cleopatra seems simply to have been the ultimate expression of her dynasty's heritage, as noted by Ager (2006):
royal incest, as practised by the Ptolemies, was only one of a larger set of behaviours, all of which were symbolic of power, and all of which were characterized by lavishness, immoderation, excess and the breaching of limits in general.
Interestingly, the potentially negative aspects of inbreeding seem not to have affected this dynasty — there is no convincing evidence of infertility, infant mortality or genetic defects, for example (Ager 2006). Instead, their main historical legacy has been their bizarre juxtaposition of either marrying each other or murdering each other, and sometimes both. Cleopatra's activities in this regard were no different to those of her ancestors.

References

Ager SL (2006) The power of excess: royal incest and the Ptolemaic dynasty. Anthropologica 48: 165-186.

Bevan ER (1927) The House of Ptolemy. Methuen Publishing, London.

Dobbs D (2010) The risks and rewards of royal incest. National Geographic Magazine.

Dray S (2012) Keeping it in the (Ptolemaic) family: when incest is best.

Meeg (2009) Royal inbreeding in Ancient Egypt.

Ian Mladjov's Genealogical Tables — The Ptolemies, kings of Egypt.

Schiff S (2010) Cleopatra: a biography. Little, Brown and Co, New York. [excerpted in Smithsonian Magazine]

van den Berghe PL, Mesher GM (1980) Royal incest and inclusive fitness. American Ethnologist 7: 300-317.

Wikipedia. Inbreeding.

Wednesday, March 26, 2014

Tutankhamun and extreme consanguinity


Consanguineous relationships involve people who are first cousins or more closely related. Apparently, about 15 percent of all marriages worldwide involve consanguineous partners, although this number has been higher in the past (Bittles 2012).

Our interest for this blog is that such relationships emphasize that so-called family trees (pedigrees) are hybridization networks not trees (see Pedigrees and phylogenies are networks not trees). Everyone can trace their maternal and paternal ancestors back into the past to a point where the lineages fuse again, and consanguineous marriages mean that this happens in the recent past rather than the distant past. To this end, we have had posts about Charles Darwin (Charles Darwin's family pedigree network), Henri Toulouse-Lautrec (Toulouse-Lautrec: family trees and networks) and Albert Einstein (Albert Einstein's consanguineous marriage). Not unexpectedly, it is royalty that provide the best-known examples (see Family trees, pedigrees and hybridization networks).

However, many cultures have taken consanguinity even further, as noted by Dobbs (2010):
While virtually every culture in recorded history has held sibling or parent-child couplings taboo, royalty have been exempted in many societies, including ancient Egypt, Inca Peru, and, at times, Central Africa, Mexico, and Thailand [and also Hawaii].
The reference to ancient Egypt includes both Cleopatra and Tutankhamun, each of whom was part of a dynasty that apparently adopted the practice of incest. As noted by Wikipedia:
In ancient Egypt, royal women carried the bloodlines and so it was advantageous for a pharaoh to marry his sister or half-sister; in such cases a special combination between endogamy and polygamy is found. Normally the old ruler's eldest son and daughter (who could be either siblings or half-siblings) became the new rulers.

Tutankhamun

Tutankhamun briefly ruled as Pharaoh from 1333-1323 BCE, at the end of the Amarna period, the 18th Dynasty. His failure to leave an heir ended the direct line of succession, and ultimately resulted in the transition to the 19th Dynasty, started by Rameses I. Tutankhamun seems to have been a rather minor king, becoming ruler at age 9 and dying at 19. He was surrounded by the power struggle that resulted from his father's attempt to found the first monotheistic religion, and being a minor he probably had little influence on the events of the time (Antanovskii 2013).

He became famous in 1922, when his near-intact tomb was discovered. He had been buried in a tomb not intended for royalty, and its location and even existence was quickly forgotten at the time — due to the political turmoil, his successors had deleted nearly all traces of the Amarna kings. In a classic case of irony, this situation made Tutankhamun's tomb safe from the robbers who removed much of the contents of other tombs in the Valley of Kings. Thus, more than 5,000 artifacts were found in his tomb, along with the well-preserved mummies (see the death mask pictured above). This has made Tutankhamun a better-known name ("King Tut") than that of anyone else from his period.

A note on names: Tutankhamun's father was Amenḥotep IV, who tried to replace the polytheistic worship associated with Amun (or Amen) and the other gods of the national pantheon with the monotheistic worship of Aten ("the disk of the sun"). He thus changed his name from Amenhotep ("Amun is satisfied") to Akhenaten ("beneficial to Aten"). His son was named Tutankhaten ("the living spirit of Aten"), but this was changed to Tutankhamun ("the living spirit of Amun") when the state religion was restored during his reign.

The history of the period surrounding Akhenaten and Tutankhamun is particularly confused, as Tutankhamun did not become pharaoh until 2 years after his father's death (Hawass 2010; Gabolde 2011). Nevertheless, the preservation of Tutankhamun's tomb has allowed us to reconstruct a possible genealogy for this period, as shown next.


Hawass et al. (2010) compared the DNA of the mummy of Tutankhamun with that of 10 royal mummies from the same period, ranging from 1,410 to 1,324 BCE. The mummy of the genetically identified father, found in grave No. 55 of the Valley of Kings, is considered to be Akhenaten. The identified mother, found in grave No. 35, was also identified to be the sister of Akhenaten. This is surprising, because only two wives of Akhenaten, Nefertiti and Kiya, are known to have had the title of Great Royal Wife, which the mother of the royal heir should bear.

Hawass et al. (2010) also looked for evidence of possible genetic effects of the consanguineous relationship (eg. homozygous genetic disorders):
An accumulation of malformations in Tutankhamun's family was evident. Several pathologies including Köhler disease II were diagnosed in Tutankhamun; none alone would have caused death. Genetic testing for genes specific for Plasmodium falciparum revealed indications of malaria tropica in four mummies, including Tutankhamun's. These results suggest avascular bone necrosis in conjunction with the malarial infection as the most likely cause of death in Tutankhamun. Walking impairment and malarial disease sustained by Tutankhamun is supported by the discovery of canes and an afterlife pharmacy in his tomb.
Incestuous marriages were nothing new to the pharaohs of Dynasty 18 (see Ian Mladjov's detailed genealogy). Part of the genealogy of its founding is shown in the next figure. Aahotep I and Sequenenra III were sister and brother, as were Aahmes-Nefertari and Aahmes (or Ahmose II). Aames (or Ahmose III) and Thotmes I were either sister and brother or half-siblings (the records are unclear).


Circles refer to females and squares to males.

Finally, it is worth noting that Marc Gabolde has an alternative explanation for the apparent genetic closeness of King Tutankhamun's parents (see Powell 2013). He suggests that Tutankhamun's mother was not his father's sister, but rather his father's first cousin, Nefertiti. The apparent genetic closeness is then not the result of a single brother-sister mating but due to three successive instances of marriage between first cousins. Nefertiti is recorded to have had six daughters with Akhenaten, but no son.

References

Antanovskii R (2013) Unmasking Tutankhamun : the figure behind the fame. Heritage Daily – Archaeology.

Bittles AH (2012) Consanguinity in Context. Cambridge University Press.

Dobbs D (2010) The risks and rewards of royal incest. National Geographic Magazine.

Gabolde M (2011) The end of the Amarna Period. BBC History – Ancient History in Depth.

Hawass Z (2010) King Tut's family secrets. National Geographic Magazine.

Hawass Z, et al. (2010) Ancestry and pathology in King Tutankhamun's family. Journal of the American Medical Association 303: 638-647.

Ian Mladjov's Genealogical Tables — The pharaohs of the New Kingdom in Egypt c. 1540-1070 BC.

Powell A (2013) A different take on Tut. Harvard Gazette.

Wednesday, March 19, 2014

Pedigrees and phylogenies are networks not trees


There is nothing in the etymology of the words 'genealogy' and 'phylogeny' that necessarily implies that they must be tree-like. Indeed, all genealogies are networks. For example, a human family "tree" is a tree only if it includes one sex alone. Otherwise, it must be a network when traced backwards from any single individual through both parents, because the lineages must eventually coalesce in a pair of shared common ancestors. This must happen if there is a single origin for Homo sapiens (ie. the species is monophyletic). The coalescence may not occur for thousands of years in the past, or it may be quite recent.

So, all pedigrees of sexually reproducing species involve conjoined lineages at both "ends", one in the common ancestor and one in the contemporary offspring.

Given the extent of inbreeding among royal families, this ancestral coalescence is quite likely to be recent among monarchs. For example, the most recent common ancestors of all of the currently reigning monarchs of Europe are John William Friso, Prince of Orange (1687-1711), and his wife, Marie Louise of Hesse-Kassel, Princess consort of Orange (1688-1765). This situation has existed since the abolition of the Albanian monarchy in 1939 (this particular monarchy was not related to the house of Orange).

Marie Louise (left) and her two children.

There used to be a Wikipedia page listing the contemporary descendants of this royal Dutch couple, but it has been deleted. It is, however, still available in the Internet Archive WayBack Machine (Royal descendants of John William Friso, Prince of Orange). This page shows that the lineages of all of the current monarchs coalesce in this couple in 7-11 generations. This is true of all 10 current monarchs (in Belgium, Denmark, Liechtenstein, Luxembourg, Monaco, the Netherlands, Norway, Spain, Sweden, the United Kingdom), many former monarchies (13 or so), many so-called pretenders or claimants (at least 21), plus two royal consorts. Interestingly, the progenitor couple achieved this set of family relationships even though they had only one daughter (Princess Amalia of Nassau-Dietz) and one son (William IV, Prince of Orange), who was born six weeks after his father's death by drowning.

Family trees were originally devised as a way for nobles to assert their nobility, by tracing their direct male ancestry from some "important" progenitor (see the picture below). The female lineages were usually ignored in such ancestries, with each woman appearing alone, solely as an isolated wife and mother. This was, of course, modelled on the genealogies listed in the christian Bible, in both Genesis 5 and 11, in which females are mentioned but only males appear to be named. However, the ancestral relationships of the current European monarchs do involve females as part of the direct lines of descent, in all cases (ie. none of the direct lines of descent can be traced solely through males).

On the left is part of a genealogy of Christ (from c. 1130-1205);
on the right is a genealogy of the House of Habsburg (c. 1540).
Reproduced from the Visual Complexity blog.

Thus, in the modern world, we should be constructing family networks not family trees, with all of the male and female lineages sharing equal prominence. This will make it clear that genealogies are networks not trees. This assumes, of course, that enough historical information can be collected to locate the actual points of coalescence. This is unlikely to be so for the likes of you and me, but the nobility seem to be able to do it quite regularly.

Family networks that reticulate within a few generations are not necessarily good things, of course. Sex-linked recessive traits such as heamophilia B are widespread among the royalty of Europe (Stevens 1999, Rogaev et al. 2009), as are autosomal dominant traits such as variegate porphyria (Cox et al. 2005). These diseases are much rarer amongst commoners.

A similar situation applies to phylogenies showing species relationships. If there is a single origin to life, then tracing phylogenies backwards in time must lead to the eventual coalescence of all lineages. Any species whose ancestry involves hybridization, introgression or horizontal gene transfer must form a network. Parts of this network might be tree-like if isolated from the rest, but the whole phylogeny cannot be anything other than a network.

Consider the following points:

Definitions:
A network is a series of overlapping groups
A tree is a set of nested groups

Observation:
Each evolutionary event defines a group (all of the descendants of the ancestor in which the event occurred)

Conclusions:
Dichotomous speciation leads to a tree, by definition
Other processes will lead to a network, by definition

We know that in biology there are both vertical (speciation) and horizontal (reticulation) evolutionary processes. Therefore, no biological data fit a tree perfectly (unless the data are carefully selected to do so). A network analysis will allow you to evaluate the relative contribution of the horizontal and vertical processes that have occurred.

References

Cox TM, Jack N, Lofthouse S, Watling J, Haines J, Warren MJ (2005) King George III and porphyria: an elemental hypothesis and investigation. Lancet 366: 332-335.

Rogaev EI, Grigorenko AP, Faskhutdinova G, Kittler ELW, Moliaka YK (2009) Genotype analysis identifies the cause of the "Royal Disease". Science 326: 817.

Stevens R. (1999) The history of hemophilia in the royal families of Europe.  British Journal of Haematology 105: 25-32.

Monday, January 6, 2014

Albert Einstein's consanguineous marriage


In previous blog posts, I have mentioned several well-known people who were involved in consanguineous marriages, which is defined as the union of two people who are related as closer than second cousins. In the first post (Charles Darwin's family pedigree network) I discussed in detail Charles Darwin (who married his first cousin); and in a later post (Toulouse-Lautrec: family trees and networks) I discussed the artist Henri Toulouse-Lautrec, who was the offspring of a marriage between first cousins. Now, it is the turn of Albert Einstein (1879-1955).

Einstein's first marriage (in 1903) was to a former fellow physics student, Mileva Marić (1875-1948). They had three children: Lieserl (1902-?), who was born the year before they married, Hans Albert (1904-1973) and Eduard (1910-1965). Einstein seems to have been far from the ideal husband or father, as detailed in the book by Roger Highfield & Paul Carter (The Private Lives of Albert Einstein, St. Martin's Griffin, 1994). Some brief information is given below.

When the marriage ended, Einstein married (in 1919) Elsa Löwenthal (née Einstein) (1876-1936), who brought with her two daughters from her own first marriage: Ilse (1897-1934) and Margot (1899-1986). As shown in the family pedigree below, Albert and Elsa were first cousins through their mothers (traced in red) and second cousins through their fathers (traced in blue). [NB. This is only part of the family tree.]


The main issue here is that this pedigree is a reticulating hybridization network, rather than a diverging tree, which clearly shows the problems with consanguineous marriages. The genetic diversity of any individual born from such a marriage has a much higher risk of expressing recessive genes in their phenotype, many of which cause serious health problems. For example, several of Darwin's children died young, and several others were apparently infertile. As well, Toulouse-Lautrec is well-known for his short stature and genetic deformities, and his brother died young, and several of his cousins (also the offspring of a consanguineous marriage) had the same genetic problem's as himself. Consanguineous marriages are not encouraged, if children are an intended outcome (see Bennett et al. 2002. Genetic counseling and screening of consanguineous couples and their offspring: recommendations of the National Society of Genetic Counselors. Journal of Genetic Counseling 11: 97-119).

Elsa and Albert are not known to have had any children (but see the note below), and it has been assumed that they had a relatively platonic relationship. So, this particular story does not have the same sad ending as those of Darwin and Toulouse-Lautrec. It would be interesting to know whether Albert and Elsa's childless state was a deliberate decision (in light of the possible genetic problems for any child), a consequence of age (they were in their 40s when they married, which makes pregnancy risky), or a result of (unreported) miscarriages.

The following note about Einstein as a husband is from The other side of Albert Einstein:
Einstein was far from the ideal husband. A year before they married, Maric gave birth to a daughter, Lieserl, while Einstein was away. The child's fate is unknown – she is presumed to have been given up for adoption, perhaps under pressure from Einstein, who is thought to have never seen his first born. After the marriage, Mileva bore two sons but the family was not to stay together. Einstein began an affair with his cousin Elsa Löwenthal while on a trip to Berlin in 1912, leaving Mileva and his family two years later. Einstein and Mileva finally divorced in 1919 ... Einstein married Elsa soon after the divorce [he had been living with Elsa for nearly five years], but a few years later began an affair with Betty Neumann, the niece of a friend. By one account, Elsa allowed Einstein to carry on with this affair to prevent him sneaking around. That relationship ended in 1924, but Einstein continued to have liaisons with other women until well after Elsa's death in 1936.
For information about a possible child of Albert and Elsa in 1932, see Einstein's son? It's a question of relativity.

Composers and consanguinity

There are many other people whose names are well-known and who were involved in a consanguineous marriage. Notably, there have been several composers of classical music:
  • Johann Sebastian Bach married his second cousin, Maria Barbara Bach. The pair had seven children together, but only four survived to adulthood.
  • Edvard Grieg married his first cousin, Nina Hagerup. Their only child, a daughter, died at the age of one. Around the same time Nina also had a miscarriage.
  • Sergei Rachmaninoff married his first cousin, Natalya Satina. They had two daughters who survived to adulthood.
  • Igor Stravinsky married his first cousin, Yekaterina Nossenko. They had four children surviving to adulthood – two sons and two daughters.
Note that this type of marriage was very unusual for Rachmaninoff and Stravinsky, because the Russian Orthodox Church explicitly forbids marriage between first cousins (both couples needed to get permission from the Czar), and so the families involved also opposed their marriages. Apparently, the relevant families also opposed Grieg's marriage. Indeed, it is reported that Edvard and Nina were surprised and disappointed to find out that they were not able to have children together.