Monday, May 28, 2018

Comparing reconstruction systems in historical linguistics

The term linguistic reconstruction has a very specific meaning in historical linguistics, pointing usually to the techniques that are used in order to infer how a given language was originally pronounced, even though it has not been attested in written sources. In previous posts, I have occasionally pointed to reconstructed forms, the so-called proto-forms, which linguists usually mark as such by putting an asterisk in from of them. For example, the word Indo-European *ph₂tér- is a reconstructed proto-form for the supposed Indo-European word "father".

While the reconstruction techniques are usually limited to languages for which we have no written record, they can in principle also be applied in order to find out how ancient languages like, for example, Latin and Greek, were pronounced in detail (Sturtevant 1920). For languages like Chinese, whose writing system leaves almost no clues about pronunciation, linguistic reconstruction is the only way to investigate the pronunciation of the oldest stages of the language.

When dealing with different reconstruction systems for Old Chinese phonology, it is quite difficult, even for experienced scholars, to spot the actual differences between the systems. That these differences exist, and that they can be quite substantial, is beyond question — and easy to understand, if one takes into account that Old Chinese is reconstructed with the help of a philological (as opposed to a mainly comparative) approach, by which data from different sources is sifted and individually weighed (cf. Jarceva 1990: 409 and List 2008).

When comparing different reconstruction systems, it is not enough simply to look at the inventories of proto-phonemes proposed by different scholars. Even if two proto-inventories (the sets of the reconstructed sounds) are exactly the same, it is possible that scholars will provide different reconstructions for individual characters. The only way to compare two or more reconstruction systems is therefore to compare the concrete reconstructions for a certain number of characters.

In addition to the sample of words, however, we also need a clear account of which segments (which proto-sounds) should be compared with each other. When comparing proto-forms for Chinese 一 ‘one’ in different Old Chinese reconstruction systems, such as Karlgren (1950) *ʔi ̯ĕt, Li (1971) *ʔjit, Wáng (1980) *iet, and Baxter and Sagart (2014) *ʔi[t], we would obviously not compare the medial *i ̯ of Karlgren with the initial *ʔ of Baxter and Sagart.

When adding more reconstructions, such as the one for 七 ‘seven’ across the four systems, for which the authors give *ts'i ̯ĕt, *tshjit, *tshiet, and *[tsh]i[t], respectively, we can further see that there are not only differences for the different segments in the same positions, but also for the interpretation of the words. Although all authors give different medials, main vowels, and finals in the two words, they are structurally consistent in giving both words the same sound segments for medial, nucleus, and coda.

What we can see from this example is that any difference in the sound segments, like the choice of initials, or the concrete solution proposed for a problem, does not immediately reflect important differences in the reconstruction systems. If two scholars just choose another symbol for a distinction that they both recognize and acknowledge, this does not render the reconstructions incompatible. It should therefore not be used as a criterion for dismissing a given reconstruction system, at least not in a first step. If two systems are structurally equivalent, then they have equivalent predictive power for the descendant language(s) they are supposed to reconstruct.

This abstractionist notion of proto-forms, which can be found in the early work of Saussure (1916) and Meillet (1903), is problematic for the endeavour of linguistic reconstruction, and usually not strictly followed (Lass 2017). Nevertheless, the potentially abstract notion of proto-forms is important to be kept in mind when comparing different reconstruction systems. When distinguishing the structural differences (which result from the direct interpretation of the data and the identification of regular sound correspondences) from the substantial differences (resulting from a phonetic and phonological interpretation of the identified correspondences), we have a much clearer account of the core of the differences, and whether they are worth our consideration or not.

But how can we compare reconstruction systems structurally? Firstly, we need to have the data assembled in aligned form, in order to make sure that we only compare like with like (e.g., medial with medial, and vowel with vowel). A sample illustration in which alignments of the proto-forms for ‘seven’ and ‘one’, produced with the help of the EDICTOR tool (List 2017), is given in the figure below.

Comparing reconstruction proposals with the help of alignments.

Alternatively, we can also select a single aspect, such as, for example, the vowel system proposed in different reconstruction systems. Having assembled a substantial amount of different proto-forms in this way, the structural comparison between different reconstruction systems can be modeled as a comparison of different cluster analyses, or, more accurately, partitioning analyses. A partitioning analysis assigns a given number of objects to a certain number of different groups. When dealing only with the vowels proposed by different reconstruction systems, we can say that a given reconstruction, like the one by Karlgren, for example, assigns each Chinese character, for which a proto-form is given, to a particular group depending on the main vowel selected for the reconstruction.

If, for a given number of reconstructions, we model each reconstruction system as a partitioning analysis, based on the main vowel proposed by the system, we can use standard metrics from graph theory and Natural Language Processing to compare different reconstruction systems with each other. Very straight-forward measures for the comparison of two partitioning analyses are the so-called B-Cubed scores (Amigó et al. 2009), which have proven specifically useful for the evaluation of automatic cognate detection methods in historical linguistics, compared to a gold standard (Hauer and Kondrak 2011, List et al. 2017).

Being an evaluation measure, B-Cubed scores come in the typical three flavors of precision, recall, and F-Score. Precision is similar to the notion of true positives, and recall is similar to true negatives. For the purpose of comparing reconstruction systems, only the F-score is needed, as it is a symmetric measure, and the notion of true positives and true negatives is meaningless, unless we decide that we blindly trust one of the given systems. As also for the scores for precision and recall, the F-score ranges between 0 and 1, with 1 indicating that the two partitioning analyses are identical.

In order to compare more than one reconstruction system, we can make use of techniques for exploratory data analysis (Morrison 2014); and the most straightforward way to do this, is, of course, to use the NeighborNet algorithm (Bryant and Moulton 2004), as provided by the SplitsTree package (Huson 1998).

In order to illustrate how data-display networks can be used to study differences among Old Chinese reconstruction systems, I designed a little experiment, based on data taken from (List et al. 2017b), who provide Old Chinese reconstructions for all rhyme words in the Shījīng based on eight different reconstruction systems (Baxter and Sagart 2014, Karlgren 1950, Li 1971, Pān 2000, Schuessler 2007, Starostin 1989, Wáng 1980, Zhèngzhāng 2003).

In order to keep the analysis simple, I extracted only the different reconstructions of the main vowel for each character in each system, and carried out a pairwise comparison of all eight systems, computing the B-Cubed F-scores for each pair, omitting characters for which no reconstruction could be found in the data. These scores were then converted to a distance matrix, and fed to the NeighborNet algorithm (the source code can be downloaded here). The resulting network is provided in the figure below.

NeighorNet reflecting the closeness of the different reconstruction systems
As one can see, the data roughly clusters into three subgroups, namely Schuessler, Baxter and Sagart, and Starostin vs. Pān and Zhèngzhāng vs. Karlgren, Li, and Wáng. On a larger scale, we can divide the data into all six-vowel systems versus the non-six-vowel systems (Karlgren, Wáng, Li). Given that Pān is a direct student of Zhèngzhāng, the closeness between their reconstruction systems is not surprising.

What may be surprising is the closeness of the Schuessler, Starostin, and Baxter and Sagart systems, given their notable differences with respect to the criterion of vowel purity tested by List et al. (2017b). Even if the network analysis cannot directly explain all of these differences in detail, it seems like a worthwhile enterprise, which should be further expanded by comparing not only the vowels, but fully aligned proto-forms.

Given the straightforwardness of the application, it seems also useful to test it on other language families where there is similar disagreement, as in the reconstruction of Old Chinese phonology.


Amigó, E., J. Gonzalo, J. Artiles, and F. Verdejo (2009): A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12.4. 461-486.

Baxter, W. and L. Sagart (2014) Old Chinese: a new reconstruction. Oxford University Press: Oxford.

Bryant, D. and V. Moulton (2004) Neighbor-Net. An agglomerative method for the construction of phylogenetic networks. Molecular Biology and Evolution 21.2. 255-265.

Hauer, B. and G. Kondrak (2011) Clustering semantically equivalent words into cognate sets in multilingual lists. In: Proceedings of the 5th International Joint Conference on Natural Language Processing. AFNLP 865-873.

Huson, D. (1998) SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14.1. 68-73.

Jarceva, V. (1990) Sovetskaja Enciklopedija: Moscow.

Karlgren, B. (1950) The Book of Odes. Chinese text, transcription and translation. Museum of Far Eastern Antiquities: Stockholm.

Lass, R. (2017) Reality in a soft science: the metaphonology of historical reconstruction. Papers in Historical Phonology 2.1. 152-163.

Li Fang-kuei 李方桂 (1971) Shànggǔyīn yánjiū 上古音研究 [Studies on Archaic Chinese phonology]. Qīnghuá Xuébào 清華學報 9.1-2. 1-60.

List, J.-M. (2008) Rekonstruktion der Aussprache des Mittel- und Altchinesischen. Vergleich der Rekonstruktionsmethoden der indogermanischen und der chinesischen Sprachwissenschaft [Reconstruction of the pronunciation of Middle and Old Chinese. Comparison of reconstruction methods in Indo-European and Chinese linguistics]. Magister thesis. Freie Universität Berlin: Berlin.

List, J.-M., S. Greenhill, and R. Gray (2017) The potential of automatic word comparison for historical linguistics. PLOS One 12.1. 1-18.

List, J.-M. (2017) A web-based interactive tool for creating, inspecting, editing, and publishing etymological datasets. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. System Demonstrations. 9-12.

List, J.-M., J. Pathmanathan, N. Hill, E. Bapteste, and P. Lopez (2017) Vowel purity and rhyme evidence in Old Chinese reconstruction. Lingua Sinica 3.1. 1-17.

Meillet, A. (1903) Introduction à l’étude comparative des langues indo-européennes. Hachette: Paris.

Morrison, D.A. (2014) Phylogenetic networks: a new form of multivariate data summary for data mining and exploratory data analysis. WIREs Data Mining and Knowledge Discovery 4: 296-312.

Pān Wùyún 潘悟云 (2000) Hànyǔ lìshǐ yīnyùnxué 汉语历史音韵学 [Chinese historical phonology]. Shànghǎi Jiàoyù 上海教育: Shànghǎi 上海.

de Saussure, F. (1916) Cours de linguistique générale. Payot: Lausanne.

Schuessler, A. (2007) ABC Etymological dictionary of Old Chinese. University of Hawai’i Press: Honolulu.

Starostin, S. (1989) Sravnitel’no-istoričeskoe jazykoznanie i leksikostatistika [Comparative-historical linguistics and lexicostatistics]. In: Kullanda, S., J. Longinov, A. Militarev, E. Nosenko, and V. Shnirel’man (eds.): Lingvističeskaja rekonstrukcija i drevnejšaja istorija VostokaMaterialy k diskussijam na konferencii.[Materials for the discussion on the conference].1. Institut Vostokovedenija: Moscow. 3-39.

Sturtevant, E. (1920) The pronunciation of Greek and Latin. University of Chicago Press: Chicago.

Zhèngzhāng Shàngfāng 郑张尚芳 (2003) Shànggǔ yīnxì 上古音系 [Old Chinese phonology]. Shànghǎi Jiàoyù 上海教育: Shànghǎi 上海.

Monday, May 21, 2018

Misunderstandings and misrepresentations about Linné's alleged family motto

This is a joint post by Magnus Lidén and David Morrison

The Swedish biologist Carl Linnaeus (1707-1778) is well known in biology as the father of modern taxonomic nomenclature, although he is better known in his own country for writing a series of travel books that cataloged the cultures and resources of Sweden.* He was knighted in 1757, and took the noble name Carl von Linné, as well as adopting a coat of arms (shown below).

It is often claimed that at the same time he adopted a family motto:
Deus creavit, Linnaeus disposuit [Latin]
God created, Linnaeus organized [English]
Gud skapade, Linné ordnade [Swedish]
Gott erschuf, Linné ordnete [German]
This claim is repeated around the internet, almost always attributing the words directly to the man himself: Deus creavit, Linnaeus disposuit he liked to say (Smithsonian Institution); Deus creavit, Linnaeus disposuit he took as his motto (Harvard University); Deus creavit, Linnaeus disposuit was how Linnaeus himself summed up his lifetime achievements (Uppsala University; and Svenska Linnésällskapet — the Swedish Linnaean Society).

The motto has been used both to mock him for his presumptuousness and to praise him for his piety. Primary references for this alleged motto are, however, conspicuously absent from any of the web sites, and our search of the literature, as well as consultation with Linné experts, have failed to present any evidence that he ever used this motto himself.

In the standard Linné biography of Fries (1903), it is simply referred to as an "illuminating epigram which admiring contemporaries used" (see Jackson 1923), which does not explain how it came to be attributed to Linnaeus, nor where it come from. FV Hope (Anon. 1843) suspected it had originated as an act of malice. Although it has been used to that end by his adversaries, it was originally meant to express awe and admiration.

As far as we can determine, the first English-language use of the motto appears as the frontispiece of this book:
The Life of Sir Charles Linnæus, Knight of the Swedish Order of the Polar Star, &c, &c.
to which is Added a Copious List of His Works, and a Biographical Sketch of the Life of His Son

By D.H. Stoever, Ph.D.
Translated from the original German
By Joseph Trapp, A.M.
B. and J. White, Fleet Street, London

As you can see, the motto is used as a banner situated directly below the coat of arms of Linné, and to all appearances is a part of it, with a portrait in profile above. This gives the impression that the words were coined by Linné himself (as was the case for the coat of arms).

However, the original German-language version of the book reveals a very different situation:
Leben des Ritters Carl von Linné
Nebst den biographischen Merkwürdigkeiten seines Sohnes, des Professors Carl von Linné
und einem vollständigen Verzeichnisse seiner Schriften, deren Ausgaben, Übersetzungen, Auszüge und Commentare

von Dietrich Heinrich Stöver, Doctor der Philosophie
Benj. Gottl. Hoffmann, Hamburg

The frontispiece has the alleged motto flanking the coat of arms of Linnaeus, rather than being part of it. This makes all the difference to the interpretation. The portrait, incidentally, is a poor copper engraving, drawn from a plaster medallion by Inländer from 1773 (cf. Tullberg 1907).

Stöver reveals his source for the words in his 1792 preface:
Das Motto unter dem Bildnisse Linné's [...] wird hoffentlich mit der Religiosität keines Lesers in Collision kommen. Es rührt von einem Manne her, der ein langer Freund des Vestorbnen war.
However, in the 1794 English translation, "langer Freund" is embellished to the point of confusion:
The motto beneath the portrait of Linnaeus [...] will not, it is humbly presumed, offend the religious opinions of any reader. It originates with a man who has lived many years in the closest ties of intimacy with the deceased.**
Whoever devised it, it seems probable that this phrase is a post-Linnéan laudation communicated to Stöver orally or by letter. At any rate, it do not appear in print until 14 years after Linné's death.

This may seem like a rather harmless "factoid", but it highlights how easily erroneous beliefs can be established, even in a scientific environment.

Other myths

This brings us to a second myth, a misconstruction of the very core of Linné's views on classification, which has seriously distorted how the development of 18th century systematics is perceived. The widely held picture of Linné as an Aristotelian Essentialist, classifying nature by Medieval Scholastic Principles of Logical Division, dates from the work of Cain (1958; see Winsor 2006), and was uncritically accepted by several influential authors, such as Mayr (1982) and Futuyma (1998). But this is like stating that Darwin was a creationist!

On the contrary, the scholastic approach is strongly criticized by Linné. He was the first to clarify the conceptual difference between the top-down divisionis leges (which he claimed will by necessity result in artificial groupings and disruption of natural taxa) and synthetic systematization. Linné emphasized that natural taxa are not defined by characters but must be built from the basic entities (species) upwards (Linnaeus 1737). He was far ahead of his time in doing this. The misrepresentation of Linné's views by Cain's and his followers has been thoroughly debunked by, for example, Skvortsov (2002), Winsor (2006), Müller-Wille (2013) and others, but it seems to be hard to eradicate.

A more amusing misunderstanding is the so-called flower clock, reputedly planted by Linné in the Hortus Academicus of Uppsala (now called Linnéträdgården, The Linné garden), about which numerous visitors and journalists ask each year. However, Linné's flower clock (1751) was a list of selected phenological observations, which never materialized in the Uppsala academic garden as an actual plantation, nor was it ever meant to. Attempts to plant flower clocks in gardens have shown that they are not very accurate as to general time-keeping across seasons and latitudes.

It seems to be quite common in English to insist on the use of titles for British people but not for foreigners. As noted by Stöver and Trapp in their book, "Carl von Linné" is best treated as the Swedish equivalent of "Sir Carl Linnaeus".


Anon. (1843) Summary of a lecture by F. V. Hope – on the portraits of Linnaeus – read for the Linnean society 21 Feb 1843 (E. Forster, Esq. in the chair). The Athenæum (Journal of english and Foreign Literature, Science and the Fine Arts) 801: 218. [in vol. 1 for the year 1843, installments 783 to 817]

Cain AJ (1958) Logic and memory in Linnaeus' system of taxonomy. Proceedings of the Linnean Society of London 169: 144-163.

Fries TM (1903) Linné. Lefnadsteckning, 2 vols. Stockholm.

Futuyma DJ (1998) Evolutionary Biology, 3 edn. Sinauer Associates, Sunderland MA.

Jackson BD (1923) Linnaeus. Abridged and adapted from Fries 1903. London.

Linnaeus C (1737) Genera Plantarum. Conrad Wishoff, Leiden.

Linnaeus C (1751) Philosophia Botanica. Godofr. Kiesewetter, Stockholm.

Mayr E (1982) The Growth of Biological Thought. Harvard University Press, Cambridge MA.

Müller-Wille S (2013) Systems and how Linnaeus looked at them in retrospect. Annals of Science 70: 305-317.

Skvortsov AK (2002) Systematics on the threshold of the 21st century: traditional principles and basics from the contemporary viewpoint. Zhurnal Obshchei Biologii 63: 82-93. [In Russian; abridged translation by Irina Kadis on WWW]

Tullberg T (1907) Linnéporträtt. Aktiebolaget Ljus, Stockholm.

Winsor MP (2006) Linnaeus' biology was not essentialist. Annals of the Missouri Botanical Garden 93: 2-7.

* On May 18 we had Linnés trädgårdsfest, which is Uppsala's celebration of Linné's working life in the town.

**According to Guido Grimm, a more literal translation would be: "It originates from an old friend of the deceased, who, being of rare noble character, summarized the widely accepted opinion(s) of experts".

Monday, May 14, 2018

Addition of a Message Board to the blog

This is a short post just to point out that there is now a Message Board on this blog, where people can post community information, such as jobs and scholarships, as well as any other requests or information. The link is at the upper-right of the blog pages.

To post a message to the Board, send an email to: Leo van Iersel.

Monday, May 7, 2018

Keeping it simple in phylogenetics

This is a post by Guido, with a bit of help from David.

There's an old saying in physics, to the effect that: "If you think you need a more complex model, then you actually need better data." This is often considered to be nonsense in the biological sciences and the humanities, because   the data produced by biodiversity is orders of magnitude more complex than anything known to physicists:
The success of physics has been obtained by applying extremely complicated methods to extremely simple systems ... The electrons in copper may describe complicated trajectories but this complexity pales in comparison with that of an earthworm. (Craig Bohren)
Or, more succinctly:
If it isn’t simple, it isn’t physics. (Polykarp Kusch)
So, in both biology and the humanities there has been a long-standing trend towards developing and using more and more complex models for data analysis. Sometimes, it seems like every little nuance in the data is important, and needs to be modeled.

However, even at the grossest level, complexity can be important. For example, in evolutionary studies, a tree-based model is often adequate for analyzing the origin and development of biodiversity, but it is inadequate for studying many reticulation processes, such as hybridization and transfer (either in biology or linguistics, for example). In the latter case, a network-based model is more appropriate.

Nevertheless, the physicists do have a point. After all, it is a long-standing truism in science that we should keep things simple:
We may assume the superiority, all things being equal, of the demonstration that derives from fewer postulates or hypotheses. (Aristoteles) 
It is futile to do with more things that which can be done with fewer. (William of Ockham) 
Plurality must never be posited without necessity. (William of Ockham) 
Everything should be as simple as it can be, but not simpler. (Albert Einstein)
To this end, it is often instructive to investigate your data with a simple model, before proceeding to a more complex analysis.

Simplicity in phylogenetics

In the case of phylogenetics, there are two parts to a model: (i) the biodiversity model (eg. chain, tree, network), and (ii) the character-evolution model. A simple analysis might drop the latter, for example, and simply display the data unadorned by any considerations of how characters might evolve, or what processes might lead to changes in biodiversity.

This way, we can see what patterns are supported by our actual data, rather than by the data processed through some pre-conceived model of change. If we were physicists, then we might find the outcome to be a more reliable representation of the real world. Furthermore, if the complex model and the simple model produce roughly the same answer, then we may not need "better data".

Modern-day geographic distribution of Dravidian languages (Fig. 1 of Kolipakam, Jordan, et al., 2018)

Historical linguistics of Dravidian languages

Vishnupriya Kolipakam, Fiona M. Jordan, Michael Dunn, Simon J. Greenhill, Remco Bouckaert, Russell D. Gray, Annemarie Verkerk (2018. A Bayesian phylogenetic study of the Dravidian language family. Royal Society Open Science) dated the splits within the Dravidian language family in a Bayesian framework. Aware of uncertainty regarding the phylogeny of this language family, they constrained and dated several topological alternatives. Furthermore, they checked how stable the age estimates are when using different, increasingly elaborate linguistic substitution models implemented in the software (BEAST2).

The preferred and unconstrained result of the Bayesian optimization is shown in their Figs 3 and 4 (their Fig. 2 shows the neighbour-net).

Fig. 3 of Kolipakam et al. (2018), constraining the North (purple), South I (red) and South II (yellow) groups as clades (PP := 1)
Fig. 4 of Kolipakam et al. (2018), result of the Bayesian dating using the same model but not constraints. The Central and South II group is mixed up.

As you can see, many branches have rather low PP support, which is a common (and inevitable) phenomenon when analyzing non-molecular data matrices providing non-trivial signals. This is a situation where support consensus networks may come in handy, which Guido pointed out in his (as yet unpublished) comment to the paper (find it here).

On Twitter, Simon Greenhill (one of the authors) posted a Bayesian PP support network as a reply.

A PP consensus network of the Bayesian tree sample, probably the one used for Fig. 3 of Kolipakam et al. 2018, constraining the North, South I, and South II groups as clades (S. Greenhill, 23/3/2018, on Twitter).

Greenhill, himself, didn't find it too revealing, but for fans of exploratory data analysis it shows, for example, that the low support for Tulu as sister to the remainder of the South I clade (PP = 0.25) is due to lack of decisive signal. In case of the low support (PP = 0.37) for the North-Central clade, one faces two alternatives: it's equally likely that the Central Parji and Olawi Godha are related to the South II group which forms a highly supported clade (PP = 0.95), including the third language of the Central group (one of the topological alternatives tested by the authors).

A question that pops up is: when we want to explore the signal in this matrix, do we need to consider complex models?

Using the simplest-possible model

The maximum-likelihood inference used here is naive in the sense that each binary character in the matrix is treated as an independent character. The matrix, however, represents a binary sequence of concepts in the lexica of the Dravidian languages (see the original paper for details).

For instance, the first, invariant, character encodes for "I" (same for all languages and coded as "1"), characters 2–16 encode for "all", and so on. Whereas "I" (character 1) may be independent from "all" (characters 2–16), the binary encodings for "all" are inter-dependent, and effectively encode a micro-phylogeny for the concept "all": characters 2–4 are parsimony-informative (ie. split the taxon set into two subsets, and compatible); the remainder are parsimony-uninformative (ie. unique to a single taxon).

The binary sequence for "All" defines three non-trivial splits, visualized as branches, which are partly compatible with the Bayesian tree; eg. Kolami groups with members of South I, and within South II we have two groups matching the subclades in the Bayesian tree.

Two analyses were run by the original authors, one using the standard binary model, Lewis’ Mk (1-paramter) model, and allowing for site-specific rate variation modelled using a Gamma-distribution (option -m BINGAMMA). As in the case of morphological data matrices (or certain SNP data sets), and in contrast to molecular data matrices, most of the characters are variable (not constant) in linguistic matrices. The lack of such invariant sites may lead to so-called “ascertainment bias” when optimizing the substitution model and calculating the likelihood.

Hence, RAxML includes an option to correct for this bias for morphological or other binary or multi-state matrices. In the case of the Dravidian language matrix, four out of the over 700 characters (sites) are invariant and were removed prior to rerun the analysis applying the correction (option -m ASC_BINGAMMA). The results of both runs show a high correlation— the Pearson correlation co-efficient of the bipartition frequencies (bootstrap support, BS) is 0.964. Nonetheless, BS support for individual branches can differ by up to 20 (which may be a genuine or random result, we don't know yet). The following figures show the bootstrap consensus network of the standard analysis and for the analysis correcting for the ascertainment bias.

Maximum likelihood (ML) bootstrap (BS) consensus network for the standard analysis. Green edges correspond to branches seen in the unconstrained Bayesian tree in Kolipakam et al. (2018, fig. 4), the olive edges to alternatives in the PP support network by S. Greenhill. Edge values show ML-BS support, and PP for comparison.

ML-BS consensus network for the analysis correcting for the ascertainment bias. BSasc annotated at edges in bold font, with BSunc and PP (graph before) provided for comparison. Note the higher tree-likeness of the graph.

Both graphs show that this characters’ naïve approach is relatively decisive, even more so when we correct against the ascertainment bias. The graphs show relatively few boxes, referring to competing, tree-incompatible signals in the underlying matrix.

Differences involve Kannada, a language that is resolved as equally related to Malayam-Tamil and Kodava-Yeruva — BSasc = 39/35, when correcting for ascertainment bias; but BSunc < 20/40, using the standard analysis); and Kolami is supported as sister to Koya-Telugu (BSasc = 69 vs. BSunc. = 49) rather than Gondi (BSasc < 20, BSunc = 21).

They also show that from a tree-inference point of view, we don't need highly sophisticated models. All branches with high (or unambiguous) PP in the original analysis are also inferred, and can be supported using maximum likelihood with the simple 1-parameter Mk model. This also means that if the scoring were to include certain biases, the models may not correct against this. At best, they help to increase the support and minimize the alternatives, although the opposite can also be true.

For relationships within the Central-South II clade (unconstrained and constrained analyses), the PP were low. The character-naïve Maximum likelihood analysis reflects some signal ambiguity, too, and can occasionally be higher than the PP. BS > PP values are directly indicative of issues with the phylogenetic signal (eg. lack of discriminative signal, topological ambiguity), because in general PP tend to overestimate and BS underestimate. The only obvious difference is that Maximum likelihood failed to provide support for the putative sister relationship between Ollari Gadba and Parji of the Central group.

The crux with using trees

When inferring a tree as the basis of our hypothesis testing, we do this under the assumption that a series of dichotomies can model the diversification process. Languages are particularly difficult in this respect, because even when we clean the data of borrowings, we cannot be sure that the formation of languages represents a simple split of one unit into two units. Support consensus networks based on the Bayesian or bootstrap tree samples can open a new viewpoint by visualizing internal conflict.

This tree-model conflict may be genuine. For example, when languages evolve and establish they may be closer or farther from their respective sibling languages and may have undergone some non-dichotomous sorting process. Alternatively, the conflict may be due to character scoring, the way one transforms a lexicon into a sequence of (here) binary characters. The support networks allow exploring these phenomena beyond the model question. Ideally, a BS of 40 vs. 30 means that 40% of the binary characters support the one alternative and 30% support the competing one.

In this respect, historical-linguistic and morphological-biology matrices have a lot in common. Languages and morphologies can provide tree-incompatible signals, or contain signals that infer different topologies. By mapping the characters on the alternatives, we can investigate whether this is a genuine signal or one related to our character coding.

Mapping the binary sequences for the concept "all" (example used above to illustrate the matrix basic properties; equalling 15 binary characters) on the ML-BS consensus network. We can see that its evolution is in pretty good agreement with the overall reconstruction. Two binaries support the sister relationship of the South II languages Koya and Telugo, and a third collects most members of the South I group. All other binaries are specific to one language, hence, do not produce a conflict with the edges in the network.