Monday, October 22, 2018

Controversies about structural data in historical linguistics

In the past, there have been many controversies about structural data, — that is, the kind of that data I introduce in the post written last month. Given the misinterpretation of structural data as being "grammatical", along with the unproven and misleading claim by Nichols (2003) that certain grammatical features are more stable than lexical ones, one can often read about a controversy in linguistics: which aspects are more stable, and therefore more useful to study deep linguistic relationships, the lexicon or the grammar?

In this context, it is often ignored that we are not talking chiefly about the grammar when applying phylogenetic studies to structural datasets. It is also ignored that the original idea of the importance of "grammar" was pointing to homologies in complex and concrete morphological paradigms, as has been most prominently discussed by Meillet (1925), later popularized by Nichols (1996) (i.e., individual word forms, that is: predominantly lexical traits). "Grammar" never pointed to abstract similarities as they are captured in most structural datasets (see the excellent discussion by Dybo and Starostin).

"Grammar" as evidence for deep language relations

Leading scholars in historical linguistics have provided convincing arguments that genetic relationships among languages can only be demonstrated by illustrating regular sound correspondences in concrete form-meaning pairs across the languages under investigation (see especially the very good analysis by Campbell and Poser 2008). In spite of this, the rumor that "grammar" (i.e., structural datasets) might provide a shortcut to detect deep, so far unnoticed, relationships among the languages of the world is very persistent, as reflected in many different studies.

Among the examples, Dunn et al. (2008) claimed that language relationships for Papuan languages of Island Melanesia could be uncovered by means of phonological and grammatical (abstract) structural features; and Longobardi et al. (2015) used syntactic features to compare the development of European languages with the development of European populations. Zhang et al. (2018) used phonological inventories of more than 100 different Chinese dialects, coding the data for simple presence and absence of each of the more than 200 different sounds in the database, and analyzing the data with the STRUCTURE software (Pritchard et al. 2000), whose results tend to be notoriously misinterpreted.

What is important about these studies is that none of them (maybe with exception of the study by Dunn et al. 2008, but I am in no position to actually judge the findings) could make a convincing claim why the structural datasets would provide evidence of deeper relationships than could the lexicon. Even the study by Dunn et al., which tests the suitability of their small questionnaire of only 115 structural traits on Oceanic languages, has since then not led to any new insights into so far undetected language relationships, contrary to the hope expressed by the authors, "that structural phylogeny is an important new tool for exploring historical relationships between languages" (ibid. 734).

Structural data as a shortcut?

Some scholars who work on structural datasets may find my claims harsh and unjustified. In fact, there are studies that seem to provide evidence that structural datasets perform similarly or equally well compared to phylogenetic methods based on lexical data.

For example, Longobardi et al.(2016) carry out experiments on structural data of phoneme inventories, syntactic features, and "traditional" cognate sets for very small Indo-European datasets, concluding that all of the datasets yield similar results, and that syntactic or phonological features in structural datasets could be used instead of lexical phylogenies.

Contrary to this, Grennhill et al. (2017) also experiment on lexical datasets in comparison with structural data for 81 Austronesian languages, but they find that, in general, lexical data is much more stable than structural data, although some structural features seem to be similar to lexical items regarding their stability.

A wish list for future tests

I see two major problems in the debate about the usefulness of structural data in historical linguistics.

First, the studies that confirm that structure might work equally well compared with lexical data, are all based on small samples of one specific language family that was analyzed based on very diverse features that were specifically designed to study the languages under question. For me, a true test that some features carry deep historical signal would need to be illustrated for a large set of related and unrelated languages, not only just for selected datasets.

Furthermore, to allow for an honest comparison with the lexicon, the selection of features should not contain any lexical characters or characters that could only be extracted with the help of lexical characters. Thus, asking whether the words for "fish", "I", and "five" are pronounced similarly in a language would not be allowed in such a feature collection, because this would follow lexical criteria, and we know very well that this property is a very good proxy for identifying Sino-Tibetan languages (Handel 2008).

Second, and more problematic, is the fact that structural datasets do not provide information on the relatedness of the traits under comparison. While this is no problem for typologists who study shared structural features out of interest in universal tendencies in the languages of the world, it is a problem for the application of phylogenetic software, since the typical approaches in biology treat homoplasy as an exception, while it may often be rather the norm than an exception in structural datasets.


In order to make structural data suitable for historical analyses, much more research needs to be carried out, including specifically a much thorougher study of parallel evolution and geographic convergence (due to language contact) in different language families of the world — a nice illustration for the Indo-European languages is provided by Cathcard et al. (2018).

I would be happy for our field if such research could reveal markers of deep genetic ancestry in the languages of the world, and help us to push the boundaries of linguistic reconstruction. For the time being, however, I remain highly skeptical, especially when scholars try to demonstrate the suitability of "grammatical" comparison with small datasets and idiosyncratically selected feature sets that are not comparable across datasets.


Campbell, L. and W. Poser (2008) Language Classification: History and Method. Cambridge University Press: Cambridge.

Cathcard, C., G. Carling, F. Larson, R. Johansson, and E. Round (2018) Areal pressure in grammatical evolution. An Indo-European case study. Diachronica 35.1: 1-34.

Dunn, M., S. Levinson, E. Lindstroem, G. Reesink, and A. Terrill (2008) Structural phylogeny in historical linguistics: methodological explorations applied in island melanesia. Language 84.4. 710-759.

Dybo, A. and G. Starostin (2008) In defense of the comparative method, or the end of the Vovin controversy. In: Smirnov, I. (ed.) Aspekty komparativistikiAspekty komparativistiki.3. RGGU: Moscow, pp 119-258.

Greenhill, S., C. Wu, X. Hua, M. Dunn, S. Levinson, and R. Gray (2017) Evolutionary dynamics of language systems. Proceedings of the National Academy of Sciences 114.42: E8822-E8829.

Handel, Z. (2008) What is Sino-Tibetan? Snapshot of a field and a language family in flux. Language and Linguistics Compass 2.3: 422-441.

Longobardi, G., S. Ghirotto, C. Guardiano, F. Tassi, A. Benazzo, A. Ceolin, and G. Barbujan (2015) Across language families: Genome diversity mirrors linguistic variation within Europe. American Journal of Physical Anthropology 157.4.: 630-640.

Longobardi, G., A. Buch, A. Ceolin, A. Ecay, C. Guardiano, M. Irimia, D. Michelioudakis, N. Radkevich, and G. Jaeger (2016) Correlated Evolution Or Not? Phylogenetic Linguistics With Syntactic, Cognacy, And Phonetic Data. In: The Evolution of Language: Proceedings of the 11th International Conference (EVOLANGX11).

Meillet, A. (1954) La méthode comparative en linguistique historique [The comparative method in historical linguistics]. Honoré Champion: Paris.

Nichols, J. (1996) The comparative method as heuristic. In: Durie, M. (ed.) The Comparative Method Reviewed. Oxford University Press: New York, pp 39-71.

Nichols, J. (2003) Diversity and stability in language. In: Joseph, B. and R. Janda (eds.) The Handbook of Historical Linguistics. Blackwell: Malden, Mass, pp 283-310.

Pritchard, J., M. Stephens, and P. Donnelly (2000) Inference of population structure using multilocus genotype data. Genetics 155: 945–959.


  1. the excellent discussion by Dybo and Starostin

    can be downloaded from here.

  2. You say that "...the studies that confirm that structure might work equally well compared with lexical data, are all based on small samples of one specific language family...". That's not quite correct. Take a look at Fig. 5 of Holman et al. 2008 ( There is a lot of information packed into that, but one of the clear results is that you are much better off comparing words than WALS-type features. This is based on many languages, across families. There is still something of a mystery lingering during the 10 years that have passed, which is that one can apparently blend some information from typological features into the lexical information and improve results. I think that it may actually be because the typological features carry a geographical signal (cf. Donohue et al's comment in Dunn et al 2008 in Language 87.2: 369-383). And subgrouping of language families often also has a geographical correlate. So while the improvement may be real, it could be there for the wrong reasons. But, while typological features are more prone to areal influence than the basic lexicon, they also do tend to have a similar stabilities (cf. Wichmann and Holman 2009). So they also carry some genealogical information. There are some issues still to be explored here. By the way, there are several papers to cite that take views pro or con the whole affair. One is Gray et al 2010 (The shape and fabric...), which showed a bad-looking tree based on WALS features. There are several other papers from 2005-2010 when the topic was hot.

    1. Thanks for the literature recommendations. I'd say, however, that the topic is still hot, to some degree, specifically if you look into regional debates about language classification, but also the more recent papers I quoted. And given that it is not clear to which ASJP can prove genetic relationship, but that some scholars claim that it can be proven with typological features alone, I don't know to which degree the comparison between ASJP and WALS would qualify as such a study. The argument of people defending the structure-is-history camp would probably point to problems in WALS. We'll see where the debate goes, especially if we manage to make more structural and lexical datasets available in our CLDF format (