The Genealogical World of Phylogenetic Networks: May 2015

Wednesday, May 27, 2015

Naudin, Wallace and Darwin — the tree idea

Charles Darwin's most poetic published words concern his image of the Tree of Life. However, he did not claim to have originated the image. For example, Alfred Russel Wallace had already used it. Recently, the Natural History Apostilles blog has mentioned another important predecessor of both Englishmen, the Frenchman Charles Naudin, who deserves wider recognition.

Darwin's well-known words from On the Origin of Species (1859) are:

The affinities of all the beings of the same class have sometimes been represented by a great tree. I believe this simile largely speaks the truth. The green and budding twigs may represent existing species; and those produced during each former year may represent the long succession of extinct species ... As buds give rise by growth to fresh buds, and these, if vigorous, branch out and overtop on all sides many a feebler branch, so by generation I believe it has been with the great Tree of Life, which fills with its dead and broken branches the crust of the earth, and covers the surface with its ever branching and beautiful ramifications.

Wallace seems to have developed the Tree of Life metaphor quite independently (1855. On the law which has regulated the introduction of new species. Annals and Magazine of Natural History, 2nd series 16: 184-196):

"the analogy of a branching tree [is] the best mode of representing the natural arrangement of species ... a complicated branching of the lines of affinity, as intricate as the twigs of a gnarled oak ... we have only fragments of this vast system, the stem and main branches being represented by extinct species of which we have no knowledge, while a vast mass of limbs and boughs and minute twigs and scattered leaves is what we have to place in order, and determine the true position each originally occupied with regard to the others."

Darwin freely admitted having read Wallace's work. Moreover, he was well aware of the other of his predecessors, Charles Naudin, because on p.167 of his 'Books Read' and 'Books to be Read' notebook of 1852-1860 (see Darwin Online CUL-DAR128) he recorded:

"Revue Horticol Imp. 1852. p. 102. Naudin Consid. Phil, sur l'espèce"

Charles Naudin's words are these, roughly translated from the original French (1852. Considérations philosophiques sur l'espèce et la variété. Revue Horticole, 4th series 1: 102-109) [NB. the long convoluted sentences are in the original]:

This doctrine of inbreeding among organic beings of the same family, the same class, and perhaps of the same kingdom, is not new; men of talent, both in France as well as abroad, among them our learned Lamarck, have supported it with all of the authority of their names. We do not deny that, on more than one occasion, they have reasoned upon assumptions which were not adequately supported by observation, that they did sometimes apply to the facts forced interpretations, that finally resulted in exaggerations that have mainly helped to push their ideas. But these defects in details do not diminish the greatness and perfect rationality of the whole system that, alone, reflects, by the community of origin, the great fact of the organizational community of the other living beings of the same kingdom, the primary basis of our rankings of species into genera, families, orders and phyla. In the opposing system now in vogue, in this system which involves many partial and independent creations we recognize or think we recognize as distinct species, one is forced to be logical, to admit the similarities exhibited by these species are only fortuitous coincidence, that is to say an effect without a cause, concluding that the reason is not acceptable. In our own [system], on the contrary, these similarities are both the consequence and proof of a relationship, not metaphorical, but real, that they hold a common ancestor, which they left at times more or less remote and through a series of intermediaries greater or fewer in number; so they express the true relationships between species by saying that the sum of their mutual similarities is the expression of their degree of relationship, as the sum of the differences is that of the distance they are from the common stock from which they derive their origin.

Considered from this point of view, the plant kingdom would present, not as a linear series whose terms would increase or decrease in organizational complexity, according as we consider starting with one end or the other; it would not be more of a disordered tangle of intersecting lines, like a geographical map, whose regions, different in shape and size, would touch by a greater or lesser number of points; it would be a tree the roots of which, mysteriously hidden in the depths of cosmological time, would have given birth to a limited number of successively divided and subdivided stems. These first stems would represent the primordial types of the kingdom; their last ramifications would be the current species.

It follows from there that a perfect and rigorous classification of the other organized beings of the same kingdom, of the same order, of the same family, if something other than the family tree even of the species, indicates the relative age of each, its degree of speciation and the line of ancestors from which it descended. Thereby would be represented, in a manner of some sort so palpable and material, the different degrees of relationship of the species, such as that of groups of varying degrees, dating back to the primordial kinds. Such a classification, summarized in a graphical table, would be seized with much facility by the mind through the eyes, and present the most beautiful application of this principle generally accepted by naturalists: that nature is avaricious [stingy?] of causes and prodigal of effects.

This is quite clearly a description of a modern phylogenetic tree, and the taxonomic consequences of adopting that conception.

It is, however, rather a pity that he explicitly rejects a network ("a disordered tangle of intersecting lines") as a suitable model, along with the chain ("a linear series").

Monday, May 25, 2015

Walking can be more dangerous than cycling

We are often told that flying is the safest way to travel, at least as far as the use of commercial airlines is concerned. In an early stand-up comedy routine, Shelley Berman noted: "Statistics prove that flying is the safest way to travel. I don't know how much consideration they've given to walking!" Well, actually, they have included walking.

Governments like to keep a track of these things, and the Department for Transport in Great Britain has released statistics on "Passenger casualty rates for different modes of travel" for 2003-2012. These modes include:

Air (passenger casualties in accidents involving UK registered airline aircraft)
Rail (passenger casualties involved in train accidents and accidents occurring through movement of railway vehicles)
Water (passenger casualties on UK registered merchant vessels)
Bus or coach (passenger casualties)
Car (driver and passenger casualties)
Van (driver and passenger casualties)
Motorcycle (driver and passenger casualties)
Pedal cycle
Pedestrian

The data are yearly averages for Great Britain from 2003-2012 inclusive, standardized as persons per billion passenger kilometres. The data are provided separately for the number of people killed, seriously injured, or slightly injured.

As usual, we can employ a phylogenetic network as a form of exploratory data analysis for these data. I first used the manhattan distance to calculate the similarity of the seven transportation modes for which there are complete data, followed by a Neighbor-net analysis to display the between-mode similarities as a phylogenetic network. So, modes that are closely connected in the network are similar to each other based on their accident figures across the ten years, and those that are further apart are progressively more different from each other.

The probability of incidents increases from right to left in the graph.

Some notable conclusions from the data are:

The probabilities of being killed, seriously injured or even slightly injured are all minuscule for air travel compared to anything else. This is a topic explored more thoroughly in an earlier blog post (A network analysis of airplane disasters).
You are much more likely to be injured in a bus than in a truck, but more likely to be killed in the truck than in the bus.
You are slightly more likely to be killed walking than cycling, but much more likely to be injured cycling.
A motorbike is the most effective way to get killed or seriously injured in Britain.

The walking versus cycling data are likely to surprise many people, but the average data across the 10 years are clear:

Pedestrian
Pedal cycle
Motorcycle

Killed
31
27
92

Seriously injured
328
550
1,043

Slightly injured
1,245
3,190
2,997

Danny Yee (Walking and cycling: relative risks) provides one explanation:

People who wouldn't even contemplate wearing special high-visability clothing or a helmet for a walk to the shops do so when cycling the same route.

Wednesday, May 20, 2015

A limitation of turning splits graphs into reticulate networks

Splits graphs are a useful way of displaying contradictory information within evolutionary datasets, either incompatible characters (ie. those that cannot fit onto a single tree) or incompatible trees. Since the graphs are unrooted, they are usually treated as a form of multivariate data display, rather than interpreted as depicting evolutionary history.

However, it is possible to turn a splits graph into a evolutionary network (sometimes called a reticulation network) once a root is specified (Huson and Klöpper 2007). This is true irrespective of whether the splits are derived from character data (Huson and Kloepper 2005), in which case it usually called a recombination network, or whether they come from a set of trees (Huson et al. 2005), in which case it is usually called a hybridization network.

The SplitsTree4 program (Huson and Bryant 2006) carries out the relevant calculations under algorithms entitled Reticulation Network, Recombination Network or Hybridization Network, although these all produce the same outcome once the set of splits has been determined. These options are no longer available from the menu system (in the current release of the program), but they can still be effected via the Configure Pipeline menu option.

The point of this post is to point out that the calculations are affected by the same limitation that has been pointed out before under other circumstances (see the post A fundamental limitation of hybridization networks?). That is, reticulation cycles with three or fewer outgoing arcs are not uniquely defined with respect to rooted splits — there are three equally optimal mathematical solutions. In practice, this means that in a situation where two taxa are involved in producing a third taxon we cannot decide from the splits alone which is the reticulate taxon and which are the two "parents" (eg. which one is the hybrid).

An example

I will illustrate this point with a simple example. The data are taken from Wendel et al. (1991). The data consist of the presence-absence of 76 nuclear allozyme loci and 13 nuclear restriction sites, for five plant taxa, one of which is the outgroup. The first graph shows the splits graph using the default options in SplitsTree4 — both the NeighborNet and the ParsimonySplits analyses produce the same graph, which identifies a single reticulation.

In SplitsTree4, the outgroup for rooting the splits graph must be the first taxon in the datafile, which in this case is Gossypium robinsonii. The following three graphs are the result of then choosing the ReticulateNetwork analysis. They differ by having, respectively, Gossypium bickii as the final taxon in the dataset, Gossypium sturtianum as the final taxon, and Gossypium australe + Gossypium nelsonii as the final two taxa. Note that the ReticulateNetwork algorithm always identifies the dataset's final taxon as the reticulate one.

So, the hybrid taxon is indeterminable from the data given, and the algorithm simply makes a (consistent) choice from among the three possibilities. [That is, the algorithm chooses as the reticulate arc whichever of the three outgoing arcs is latest in the dataset.]

The original authors suggest that the nuclear and other data "indicate a biphyletic ancestry of G. bickii. Our preferred hypothesis involves an ancient hybridization, in which G. sturtianum, or a similar species, served as the maternal parent with a paternal donor from the lineage leading to G. australe and G. nelsoni." This doesn't quite match any of the three rooted networks shown above.

References

Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23: 254-267.

Huson DH, Kloepper TH (2005) Computing recombination networks from binary sequences. Bioinformatics 21: ii159-ii165.

Huson DH, Klöpper TH (2007) Beyond galled trees – decomposition and computation of galled networks. Lecture Notes in Bioinformatics 4453: 211-225.

Huson DH, Klöpper T, Lockhart PJ, Steel MA (2005) Reconstruction of reticulate networks from gene trees. Lecture Notes in Bioinformatics 3500: 233-249.

Wendel JF, Stewart JM, Rettig JH (1991) Molecular evidence for homoploid reticulate evolution among Australian species of Gossypium. Evolution 45: 694-711.

Monday, May 18, 2015

An unusual genealogy

"Genealogies" produced on the web are frequently no such thing, they are merely timelines. However, the following alleged Genealogy of Automobile Companies seems to really be one, and it has a number of odd characteristics. These characteristics are quite common among manufactured products.

It is described as "A flowing history of more than 100 automobile companies across the complete time span of the automobile industry." Actually, it focuses on companies in the USA, up to 2012. You can zoom in on the details by visiting the original image at HistoryShots InfoArt.

First, note that the genealogy has multiple roots. Second, lineages coalesce forwards through time rather than diverging, so that the lineages become clustered. Moreover, some lineages do not connect to any others. Finally, there is horizontal transfer, because parts of companies get sold to other companies.

There is also a similar Genealogy of US Airlines, and a Genealogy of International Airlines.

Wednesday, May 13, 2015

Homology and cognacy: fundamental historical relations between words

This is a guest blog post, following on from his previous post, by:

Johann-Mattis List

Centre des Recherches Linguistiques sur l'Asie Orientale, Paris, France

Introduction

All languages constantly change. Words are lost when speakers cease to use them, new words are gained when new concepts evolve, and even the pronunciation of the words changes slightly over time. Slight modifications that can barely be noticed during a person's lifetime sum up to great changes in the system of a language over centuries. When the speakers of a language diverge, their speech keeps on changing independently in the two communities, and at a certain point of time the independent changes are so great that they can no longer communicate with each other — what was one language has become two.

Demonstrating that two languages once were one is one of the major tasks of historical linguistics. If no written documents of the ancestral language exist, one has to rely on specific techniques for linguistic reconstruction (see the examples in this previous post). These techniques require us to first identify those words in the descendant languages that presumably go back to a common word form in the ancestral language. In identifying these words, we infer historical relations between them. The most fundamental historical relation between words is the relation of common descent. However, similarly to evolutionary biology, where homology can be further subdivided into the more specific relations of orthology, paralogy, and xenology, more specific fundamental historical relations between words can be defined for historical linguistics, depending on the underlying evolutionary scenario.

Homology and Cognacy in Linguistics and Biology

In evolutionary biology there is a rather rich terminological framework describing fundamental historical relations between genes and morphological characters. Discussions regarding the epistemological and ontological aspects of these relations are still ongoing (see the overview in Koonin 2005, but also this recent post by David). Linguists, in contrast, have rarely addressed these questions directly. They rather assumed that the fundamental historical relations between words are more or less self-evident, with only few counter-examples, which were largely ignored in the literature (Arapov and Xerc 1974; Holzer 1996; Katičić 1966). As a result, our traditional terminology to describe the fundamental historical relations between words is very imprecise and often leads to confusion, especially when it comes to computational applications that are based on software originally developed for applications in evolutionary biology.

As an example, consider the fundamental concept of homology in evolutionary biology. According to Koonin (2005: 311), it "designates a relationship of common descent between any entities, without further specification of the evolutionary scenario". The terms orthology, paralogy, and xenology are used to address more specific relations. Orthology refers to "genes related via speciation" (Koonin 2005: 311); that is, genes related via direct descent. Paralogy refers to "genes related via duplication" (ibid.); that is, genes related via indirect descent. Xenology, a notion which was introduced by Gray and Fitch (1983), refers to genes "whose history, since their common ancestor, involves an interspecies (horizontal) transfer of the genetic material for at least one of those characters" (Fitch 2000: 229); i.e. to genes related via descent involving lateral transfer.

In historical linguistics, the only relation that is explicitly defined is cognacy (also called cognation). Cognacy usually refers to words related via “descent from a common ancestor” (Trask 2000: 63), and it is strictly distinguished from descent involving lateral transfer (borrowing). The term cognacy itself, however, covers both direct and indirect descent. Hence, traditionally, German Zahn 'tooth' is cognate with English tooth, and German selig 'blessed' with English silly, and German Geburt 'birth' with English birth, although the historical processes that shaped the present appearance of these three word pairs are quite different. Apart from the sound shape, Zahn and tooth have regularly developed from Proto-Germanic *tanθ-; selig and silly both go back to Proto-Germanic *sæli- 'happy', but the meaning of the English word has changed greatly; Geburt and birth stem from Proto-Germanic *ga-burdi-, but the English word has lost the prefix as a result of specific morphological processes during the development of the English language (all examples follow Kluge and Seebold 2002, with modifications for the pronunciation of Proto-Germanic). Thus, of the three examples of cognate words given, only the first would qualify as having evolved by direct inheritance, while the inheritance of the latter two could be labelled as indirect, involving processes which are largely language-specific and irregular, such as meaning shift and morpheme loss. Trask (2000: 234) suggests the term oblique cognacy to label these cases of indirect inheritance, but this term seems to be rarely used in historical linguistics; and at least in the mainstream literature of historical linguistics I could not find even a single instance where the term was employed (apart from the passage by Trask).

In the table above (with modifications taken from List 2014: 39), I have tried to contrast the terminology used in evolutionary biology and historical linguistics by comparing to which degree they reflect fundamental historical relations between words or genes. Here, common descent is treated as a basic relation which can be further subdivided into relations of direct common descent, indirect common descent, and common descent involving lateral transfer. As one can easily see, historical linguistics lacks proper terms for at least half of the relations, offering no exact counterparts for homology, orthology, and xenology in evolutionary biology.

Cognacy in historical linguistics is often deemed to be identical with homology in evolutionary biology, but this is only true if one ignores common descent involving lateral transfer. One may argue that the notion of xenology is not unknown to linguists, since the borrowing of words is a very common phenomenon in language history. However, the specific relation which is termed xenology in biology has no direct counterpart in historical linguistics: the term borrowing refers to a distinct process, not a relation resulting from the process. There is no common term in historical linguistics which addresses the specific relation between such words as German kurz 'short' and English short. These words are not cognate, since the German word has been borrowed from Latin cŭrtus 'mutilated' (Kluge and Seebold 2002). They share, however, a common history, since Latin cŭrtus and English short both (may) go back to Proto-Indo-European *(s)ker- 'cut off' (Vaan 2008: 158). The specific history behind these relations is illustrated in the following figure.

A specific advantage of the biological notion of homology as a basic relation covering any kind of historical relatedness, compared to the linguistic notion of cognacy as a basic relation covering direct and indirect common descent, is that the former is much more realistic regarding the epistemological limits of historical research. Up to a certain point, it can be fairly reliably demonstrated that the basic entities in the respective disciplines (words, genes, or morphological characters) share a common history. Demonstrating that more detailed relations hold, however, is often much harder. The strict notion of cognacy has forced linguists to set goals for their discipline which may often be far too ambitious to achieve. We need to adjust our terminology accordingly and bring our goals into balance with the epistemological limits of our discipline. In order to do so, I have proposed to refine our current terminology in historical linguistics to the schema shown in the table below (with modifications taken from List 2014: 44):

Fifty Shades of Cognacy

In a recent blog post, David pointed to the relative character of homology in evolutionary biology in emphasizing that it "only applies locally, to any one level of the hierarchy of character generalization". Recalling his example of bat wings compared to bird wings, which are homologous when comparing them as forelimbs but who are analogous when comparing them as wings, we can find similar examples in historical linguistics.

If we consider words for 'to give' in the four Romance languages Portuguese, Spanish, Provencal and French, then we can state that both Portuguese dar and Spanish dar are homologous, as are Provencal douna and French donner. The former pair go back to the Latin word dare 'to give', and the latter pair go back to the Latin word donare 'to gift (give as a present)'. In those times when Latin was commonly spoken, both dare and donare were clearly separated words denoting clearly separated contexts and being used in clearly separated contexts. The verb donare itself was derived from Latin donum 'present, gift'. Similarly to English where nouns can be easily used as verbs, Latin allowed for specific morphological processes. In contrast to English, however, these processes required that the form of the noun was modified (compare English gift vs. to gift with Latin donum vs. donare).

What the ancient Romans (who spoke Latin as their native tongue) were not aware of is that Latin donum 'gift' and Latin dare 'to give' themselve go back to a common word form. This was no longer evident in Latin, but it was in Proto-Indo-European, the ancestor of the Latin language. Thus, Latin dare goes back to Proto-Indo-European *deh₃- 'to give', and Latin donum goes back to Proto-Indo-European *deh₃-no- 'that which is given (the gift)' (Meiser 1999; what is written as *h₃ in this context was probably pronounced as [x] or [h]). The word form *deh₃-no- is a regular derivation from *deh₃-, so at the Indo-European level both forms are homologous, since one is derived from the other. That means, in turn, that Latin dare and donum are also homologs, since they are the residual forms of the two homologous words in Proto-Indo-European. And since Latin donare is a regular derivation of donum, this means, again, that Latin dare and donare are also homologous, as are the words in the four descendant languages, Portuguese dar, Spanish dar, Provencal douna, and French donner. Depending on the time depth we apply, we will arrive at different homology decisions. I have tried to depict the complex history of the words in the following figure:

Judging from the treatment in linguistic databases, many scholars do not regard these different "shades of homology" as a real problem. In most cases, scholars use a "lumping approach" and label as cognates all words that go back to a common root, no matter how far that root goes back in time (compare, for example, the cognate labeling for reflexes of Proto-Indo-European *deh₃- in the IELex).

Importantly, this labeling practice, however, may be contrary to the models that are used to analyze the data afterwards. All computational analyses model language evolution as a process of word gain and word loss. The words for the analyses are sampled from an initial set of concepts (such as 'give', 'hand', 'foot', 'stone', etc.) which are translated into the languages under investigation. If we did not know about the deeper history of Latin dare and donare, we would assume a regular process of language evolution here: at some point, the speakers of Gallo-Romance would cease to use the word dare to express the meaning 'to give' and use the word donare instead, while the speakers of Ibero-Romance would keep on using the word dare. This well-known process of lexical replacement (illustrated in the graphic below), which may provide strong phylogenetic signals, is lost in the current encoding practice where all four words are treated as homologs. Our current practice of cognate coding masks vital processes of language change.

Outlook

Historical linguistics needs a more serious analysis of the fundamental processes of language change and the fundamental historical relations resulting from these processes. In the last two decades a large arsenal of quantitative methods has been introduced in historical linguistics. The majority of these methods come from evolutionary biology. While we have quickly learned to adapt and apply these methods to address questions of language classification and language evolution, we have forgotten to ask whether the processes these methods are supposed to model actually coincide with the fundamental processes of language evolution. Apart from adapting only the methods from evolutionary biology, we should consider also adapting the habit of having deeper discussions regarding the very basics of our methodology.

References

Arapov MV, Xerc MM (1974) Математические методы в исторической лингвистике [Mathematical methods in historical linguistics]. Moscow: Nauka. German translation: Arapov, M. V. and M. M. Cherc (1983). Mathematische Methoden in der historischen Linguistik. Trans. by R. Köhler and P. Schmidt. Bochum: Brockmeyer.

Fitch WM (2000) Homology: a personal view on some of the problems. Trends in Genetics 16.5, 227-231.

Gray GS, Fitch WM (1983) Evolution of antibiotic resistance genes: the DNA sequence of a kanamycin resistance gene from Staphylococcus aureus. Molecular Biology and Evolution 1.1, 57-66.

Holzer G (1996) Das Erschließen unbelegter Sprachen. Zu den theoretischen Grundlagen der genetischen Linguistik. Frankfurt am Main: Lang

Katičić R (1966) Modellbegriffe in der vergleichenden Sprachwissenschaft. Kratylos 11, 49-67.

Kluge F, Seebold E (2002) Etymologisches Wörterbuch der deutschen Sprache. 24th ed. Berlin: de Gruyter.

List J-M (2014) Sequence Comparison in Historical Linguistics. Düsseldorf: Düsseldorf University Press.

Meiser G (1999) Historische Laut- und Formenlehre der lateinischen Sprache. Wissenschaftliche Buchgesellschaft: Darmstadt.

Trask RL (2000) The Dictionary of Historical and Comparative Linguistics. Edinburgh: Edinburgh University Press.

Vaan M (2008) Etymological Dictionary of Latin and the Other Italic Languages. Leiden and Boston: Brill.

Monday, May 11, 2015

The evolution of humor

Actually, if you do a search you will find that there are lots of non-humorous papers on the evolution of humor, in the variational sense not the transformational one, as used here.

Wednesday, May 6, 2015

Pattern and process: computation and biology

It is obvious that there is a big cultural difference between biologists and computationalists, irrespective of whether we think its a good idea or not. This follows simply from the nature of the activities in the two professions — the activities are different and therefore different personalities are attracted to those professions.

Some of these differences are well known. For example, computations require algorithmic repeatability, along with proof that the algorithms achieve the explicitly stated goal. This means that computationalists have to be pedants in order to succeed. On the other hand, no-one can be pedantic and succeed in biology. Biodiversity is a concept that makes it clear that there are no rules to biological phenomena — any generalization that you can think of will turn out to have numerous exceptions. In the biological sciences we do not look for universal "laws" (as in the physical sciences), because there are none; and if you can't handle that fact then you should not try to become a biologist.

This leads to a further difference between the two professions that I think is sometimes poorly appreciated. In general, computationalists focus on patterns, whereas biologists focus on processes. Many processes can produce the same patterns, and therefore the same computations can be used to detect those patterns; and this is of interest to people who are developing algorithms. On the other hand, in biology processes can produce many different patterns, so that patterns are often unpredictable. Biologists are aware that patterns and processes can be poorly connected, and the biological interest is primarily on understanding the processes, because these are frequently more generalizable than are the patterns.

As a simple example of this dichotomy, consider the following diagram (from Loren H. Rieseberg and Richard D. Noyes. 1998. Genetic map-based studies of reticulate evolution in plants. Trends in Plant Science 3: 254-259). It shows the eight haploid chromosomes of a particular plant species.

Perusal of the figure will lead you to identify the pattern, and this is straightforward to detect computationally. Each chromosomal segment is triplicated, but the triplicates are arranged arbitrarily and are sometimes segmented.

On its own this is of little biological interest. The interest lies in the processes that led to the pattern. These processes could produce an infinite number of similar patterns, and so predicting the exact pattern in this species is impossible. We use abduction to proceed from the pattern to the processes (see What we know, what we know we can know, and what we know we cannot know).

We appear to be looking at a case of allopolyploidy (the nuclear genome is hexaploid) followed by recombination. Neither of these processes necessarily produces patterns that can be predicted in detail.

So, the computation focuses on the pattern and the biology on the process. Sometimes biologists forget this, and naively interpret patterns as inevitably implying a particular process. And sometimes computationalists naively expect patterns to be predictable when they are not.

Monday, May 4, 2015

A geek network

I have noted before that many of the diagrams on the web purporting to show "evolution" actually show transformational evolution rather than variational evolution, as is done in biology and the historical social sciences (eg. Non-phylogenetic trees; Evolution and timelines; The evolutionary March of Progress in popular culture).

This diagram seems to be an improvement, however. Perhaps its geekiness is responsible for this?

This is an evolutionary network because it is rooted, at "Geekus Prime". You will note that it is a population network rather than strictly a phylogenetic network. That is, many of the internal nodes are labeled with extant taxa, so that both ancestors and their descendants appear. It is a network rather than a tree, because the "World of Warcraft Geek" is a hybrid between the "Dungeons and Dragons Geek" and the ancestor of the "Video Game Geek".

Pages