Tuesday, October 31, 2017

"Man gave names to all those animals": cats and dogs

This is a joint post by Guido Grimm and Johann-Mattis List.

As specialists, we rarely dare to dive into cross-disciplinary research. However, in a small series of posts, we will now try to open a door between linguistics, phylogenetics, biogeography, and molecular genetics (with its various subdisciplines), using the curious cases of domestic animals, such as cat, dog, goat, and sheep, and what these are called in various Eurasian languages, with a special focus on Indo-European languages.

Today's post will introduce the little dataset that we have created, and discuss the findings for the names of cats and dogs. A follow-up post will be devoted to goats and sheep.

Domesticated animals and their names

Various types of archaeological and biological research revolve around the domestication of animals — GoogleScholar gives tens of thousands of hits for search items such as "cat domestication"; and we have several blog posts about the need for networks to illustrate the genealogy of domestication. However, linguistic literature on these topics is rather sparse, often related to specific language families, such as domesticated animals in the Indo-European proto-society (Anthony and Ringe 2015).

Nevertheless, many studies mention the potential value of linguistic evidence as some specific kind of indirect evidence, which should be considered when carrying out research on domestication (see, for example, Kraft et al. 2015). Furthermore, the public interest in domestic animals such as cat, dog, goat and sheep, is reflected by the number of languages in which Wikipedia articles are available: the domestic dog (219 entries), our most trusted companion animal, narrowly beats the cat (211 entries), our least-productive domestic animal but, according to cliché, an obligatory accessory for e.g. literates, thinkers, and little old ladies (entry counts include extinct ones like Gothic). Sheep are available for 166 languages, and goats for 142.

One doesn't have to travel far to recognize substantial difference between the four animal names. For example, when Guido moved to Sweden, the most confusing thing was "Fåret Shaun", which he knew as "Shaun, das Schaf" in German, or "Shaun, the sheep" in English. [As an aside, Shaun's name is a pun in English, but not in German or Swedish.] While Swedish and German / English differ greatly in the pronunciation of the words they use to denote "sheep", the Swedish words for "cat" (Swedish katt, German Katze), "dog" (hund vs. Hund), and "goat" (get vs. Geiß) are essentially the same (using Guido's dialect of German). They also are basically the same for many other essential items, such as "house" (hus vs. Haus), and "hand" (hand vs. Hand).

Since Guido moved to France, he has been watching "Shaun le mouton"; and Hund ("dog") has become chien. He now needs to look for chèvre ("goat") when making choosing his cheeses; but his cats are called chats, which is similar in writing (and linguistic history) but phonetically rather different, as the word is pronounced as [ʃa] (sha).

When Mattis visited China, he had few problems memorizing the word for "cat", as the Chinese word māo is quite similar to the sound which cats are alleged to make in many languages (see the list on Wikipedia for cross-linguistic similarities of onomatopoeia). The words for "sheep" and "goat", on the other hand, were surprisingly the same, the former being called míanyáng, which roughly translates as "soft sheep/goat", while the latter is called shānyáng which translates to "mountain sheep/goat".

Differences in animal naming

We were intrigued by these differences and similarities of animal names across different languages. So, we decided to investigate this further, by comparing pronunciation differences for "dog", "cat", "goat", and "sheep" across a larger sample of languages. For this purpose, we selected 28 different languages, and searched for the translations as they are given in the different Wikipedia articles. We then manually added the pronunciations, based on different sources, such as Wiktionary, our own knowledge of some of the languages, or specialized sources listing translations and transcriptions (Key and Comrie 2016; Huang et al. 1992).

We then used the overall pronunciation distances for all languages as proposed by Jäger (2015), who applied sophisticated alignment algorithms to a sample of 40 historically stable words per language for a large sample of North Eurasian languages (taken from the ASJP database). Since our sample contains languages which have never been shown to be historically related, the networks which we inferred from these distances should not be interpreted as true phylogenies, but rather as an aid for visualizing overall similarities among them.

To compare the pronunciation differences of our small datasets of animal names, we used the LingPy software (List and Forkel 2016, http://lingpy.org) to cluster the data into preliminary sets of phonetically similar words. As we lack the data to carry out deep inference of truly historical similarities, for this purpose we used the Sound-Class-Based Phonetic Alignment Algorithm (for details, see List et al. 2017). This algorithm compares words for shallow phonetic similarity with some degree of historical information. As a result, the inferred clusters do not (as we will see below) reflect true instances of cognacy (homology), but rather serve as a proxy for similarity of pronunciation.

Cats and Dogs

It is commonly assumed that the dog (Canis lupus familiaris, literally the 'domestic wolf-dog') was the first animal domesticated by humans, although it has not yet been settled exactly when and where. Multiple domestication events are quite likely, with respect to the (grey) wolves' (Canis lupus) natural behaviour (i.e. living in small family groups with complex social structure) and being originally distributed across Eurasia, although genetic studies have lead to inconclusive results (compare the contradicting results in Frantz et al. 2016 and Botigue et al. 2017). Its trainability and pack-loyalty make the wolf an excellent hunting companion, and wolf packs migrate naturally over long distances, which perfectly fits early (pre-cultivation) human societies of hunters and gatherers. Accordingly, ages of up to 30,000 BC have been proposed for the dog's domestication (Botigue et al. 2017).

In contrast, the cat, Felis sylvestris (literally the 'forest cat'), is a solitary, very elusive animal. It was domesticated much later, and most likely in the Near East (Driscoll et al. 2009). In contrast to other domestic animals, it has no direct use (other than luxury), and rather trains its owners than being trained (e.g., there are no police cats, and very rarely circus cats). But the cat decimates rodents and other small mammals, as well as birds. Thus, the domestication of cats likely followed the cultivation of wheat, and is possibly instrumental for building up fixed settlements and agricultural societies (Driscoll et al. 2009). Thus, George R.R. Martin's fictional character Haviland Tuf may be right when judging all human societies throughout the universe by how they treat cats: "civilized" people cherish them, "barbaric" societies don't!

Figure 1: Terms for cat in our sample

Thus, the hypothesis is that the dog was probably with us from the dawn of our civilization, while the cat opportunistically followed human settlements because these provide a surplus of food (and ultimately shelter). This idea is well reflected by the literal and phonetic variation of the words for "cat" (Figure 1) and "dog" (Figure 2). Cats are called by essentially the same names in all western Eurasian languages (be they Indo-European or not), but the word for dog can be phonetically very different in even closely related languages.

As you can see in the plot, the name for "cat" (English) is effectively similar across all of the Indo-European languages of western Eurasia in our sample, while the name for "dog" sounds quite different. Given that similar names for "cat" can be found in languages of northern Africa (Pfeifer 1993: s. v. "Katze"), this provides additional evidence for the Near-East domestication of the cat; and we can assume that the word traveled to Europe along with its carriers. On the other hand, the differences in the names for "dog" across all Indo-European languages in our sample reflect language change, rather than different naming practices. With the exception of Indic, Greek, and the Slavic languages, which coined new terms (cf. Derksen 2008: 431, and the cognates sets in IELex), the dog terms in Romance (with exception of Spanish), Germanic (with exception of English), Baltic and Armenian all evolved from the same root.

Figure 2: Terms for dog in our sample

With respect to the genetics of the dog (origin unclear) and the cat (origin in the Near East), plus the migration history of European people, the most likely hypothesis, which is also supported by Indo-European linguists, assumes that the dog was already with the humans before the Indo-European languages formed, following their migrations. Given the importance of the term, people may have avoided replacing it with a new term. This is also reflected in the cross-linguistic stability of the concept "dog", usually listed as one of the most stable concepts which are rarely replaced by neologisms ("dog" ranks at place 16 of Starostin's 2007 stability scale; "cat" is not even included).

With linguistic methods for language comparison, we can show that these words share a common origin, but stability does not imply that the pronunciation of the words is not affected. It is difficult to say how fast pronunciation evolves in general, but assuming that greater phonetic differences indicate a greater amount of elapsed time is a useful proxy. Since many Indo-European languages arrived in Europe by migration waves from the steppes of Central Asia, it is little surprise that each of these waves brought its modified variant of the original term for "dog" in Proto-Indo-European to Europe. Given the importance of the term for the daily lives of the people, speakers of one language variety would also not necessarily feel obliged to borrow the terms from neighboring language communities.

In Hebrew (not included in Figure 1), the word for cat is חתול khatúl. The Celtic Irish term is cat, and even the Basques, with their entirely unrelated language, have the word katu, probably a borrowing from the surrounding Romance languages (cf. Spanish gato). When the Germanic tribes (BC) and Slavs (AD) arrived on horseback, accompanied by their *hunda- (Kroonen 2013: 256), or their *pesə (Derksen 2008: 431), they settled down, started farming, and then took up the *kattōn- and the *kotə from the locals. This is interesting, because we have to assume (based on genetics and modern distribution of the wild subspecies of Felis sylvestris) that there were always wild cats in the European woods. Either the word for them was lost in surviving languages, or the hunters and gathers living in Europe never bothered to name a small furry animal that – at best – could be just glimpsed.

Notably, the South Asian Indo-European languages and the East Asian Sino-Tibetic languages have their own terms for cats (Figure 1), but the word is globally quite invariable in stark contrast to the terms for "dog".

Where does this lead?

Our graphs are at this point indicate many curiosities. Nevertheless, by mapping words associated with animals (or plants), crucial for the history of human civilisation, we may tap into a complete new data set to discuss different scenarios erected by archaeologists and historians regarding domestication and beyond. While linguists, archaeologists, and geneticists have been working a lot on these questions on their own, examples for a rigorous collaboration, involving larger datasets and common research questions, are – to our current knowledge from sifting the literature – still rather rare. Furthermore, most linguistic accounts are anecdotal. They provide valuable insights, but these insights are not amenable for empirical investigations, as they are only reflected in prose. As a result, recent articles concentrating on archaeogenetic studies often ignore linguistic evidence completely. Given the uncertainty about the origin of domesticated animals and plants, despite advanced methods and techniques in archaeology and genetics, it seems that this strategy of simply putting linguistic evidence to one side deserves some re-evaluation.

It seems to be about time to pursue these questions in data-driven frameworks. When doing so, however, we need to be careful in the way we treat linguistic data as evidence. What we need is a thorough understanding of the processes underlying "naming" in language evolution. We constantly modify our lexicon, be it (i) by no longer using certain words, (ii) by using certain previously unfashionable words more frequently, (iii) by coining new words, or (iv) by borrowing words from our linguistic neighbors. So far, we still barely understand under which conditions societies will tend to keep a certain word against pressure from linguistic neighbors who use a different term, or when they will prefer to coin their own new words for newly introduced techniques, animals, or plants, instead of taking the words along with the technology.

Linguists can say a few things about this; and etymological dictionaries, some of which we also consulted for this study, offer a wealth of information for some terms. However, without formalizing our linguistic knowledge, providing standardization efforts (compare the Tsammalex or the Concepticon projects) and improvement of algorithms for automatic sequence comparison, linguists will have a hard time keeping pace with quickly evolving disciplines like archaeogenetics and archaeology.

  • Anthony, D. and D. Ringe (2015) The Indo-European homeland from linguistic and Archaeological perspectives. Annual Review of Linguistics 1: 199-219.
  • Botigue, L., S. Song, A. Scheu, S. Gopalan, A. Pendleton, M. Oetjens, A. Taravella, T. Seregely, A. Zeeb-Lanz, R. Arbogast, D. Bobo, K. Daly, M. Unterlander, J. Burger, J. Kidd, and K. Veeramah (2017) Ancient European dog genomes reveal continuity since the Early Neolithic. Nature Communications 8: 16082.
  • Derksen, R. (2008) Etymological dictionary of the Slavic inherited lexicon. Brill: Leiden and Boston.
  • Driscoll, C., D. Macdonald, and S. O’Brien (2009) From wild animals to domestic pets, an evolutionary view of domestication. Proceedings of the National Academy of Sciences 106 Suppl 1: 9971-9978.
  • Frantz, L.A., V.E. Mullin, M. Pionnier-Capitan, O. Lebrasseur, M. Ollivier, A. Perri, A. Linderholm, V. Mattiangeli, M.D. Teasdale, E.A. Dimopoulos, A. Tresset, M. Duffraisse, F. McCormick, L. Bartosiewicz, E. Gal, É.A. Nyerges, M.V. Sablin, S. Bréhard, M. Mashkour, A. Bălăşescu, B. Gillet, S. Hughes, O. Chassaing, C. Hitte, J.-D. Vigne, K. Dobney, C. Hänni, D.G. Bradley, G. Larson (2016) Genomic and archaeological evidence suggest a dual origin of domestic dogs. Science 352: 1228-1231.
  • Huáng Bùfán 黃布凡 (1992) Zàngmiǎn yǔzú yǔyán cíhuì [A Tibeto-Burman lexicon]. Zhōngyāng Mínzú Dàxué 中央民族大学 [Central Institute of Minorities]: Běijīng 北京.
  • Jäger, G. (2015) Support for linguistic macrofamilies from weighted alignment. Proceedings of the National Academy of Sciences 112: 12752-12757.
  • Key, M. and B. Comrie (2016) The intercontinental dictionary series. Max Planck Institute for Evolutionary Anthropology: Leipzig.
  • Kraft, K., C. Brown, G. Nabhan, E. Luedeling, J. Luna Ruiz, G. Coppens d’Eeckenbrugge, R. Hijmans, and P. Gepts (2014) Multiple lines of evidence for the origin of domesticated chili pepper, Capsicum annuum, in Mexico. Proceedings of the National academy of Sciences of the United States of America 111: 6165-6170.
  • Kroonen, G. (2013) Etymological dictionary of Proto-Germanic. Brill: Leiden and Boston.
  • List, J.-M. and R. Forkel (2016) LingPy. A Python library for historical linguistics. Max Planck Institute for the Science of Human History: Jena.
  • List, J.-M., S. Greenhill, and R. Gray (2017) The potential of automatic word comparison for historical linguistics. PLOS ONE 12: 1-18.
  • Pfeifer, W. (1993) Etymologisches Wörterbuch des Deutschen. Akademie: Berlin.
  • Starostin, S. (2007) Opredelenije ustojčivosti bazisnoj leksiki [Determining the stability of basic words]. In: : S. A. Starostin: Trudy po jazykoznaniju [S. A. Starostin: Works on linguistics. Languages of Slavic Cultures: Moscow. 580-590.
Final Remark

Given that we had little time to review all of the literature on domestication in these disciplines, we may well have missed important aspects, and we may well have even failed to be original in our claims. We would like to encourage potential readers of this blog to provide us with additional hints and productive criticism. In case you know more about these topics than we have reported here, please get in touch with us — we will be glad to learn more.


  1. Very nice post, Mattis and Guido!

    Some comments out of my head (no access literature besides the web at this time).

    One of the reasons for the different evolutionary path of the words for dogs and cats, reflected in their stability, might be the differences "inside" the species and their "usages". Dogs come in all sizes and shapes (even in ancient times there were equivalents of today lapdogs), and there have always been races associated to a particular geographic origin/people (as we can still see in the name of many races, German Shepard and the like); cats, on the other hand, showed much less diversity until race exhibitions from 19th century on and have always been pets and small-animal predators.

    This might be reflected in some words. Regarding dogs, for example, there is some indication that the current English word was first applied to large, solidly built animals, like mastiffs, only taking over "hound" later (it can still be found in 'dogo' races), restricting the word for hunting animals. As for the Portuguese word, for example, while "cão" exists, it is nowadays a bit outdated and literary term, with "cachorro" being the more neutral, especially in Brazil (a word of Basque origin, meaning "puppy", as it still does in Spanish, and originally applied to small dogs). The hypothesis would be that the specialization of dog types leads to a larger pool of terms, from which a general name might take over (I am thinking a process similar to the well known equus/cabullus case from Latin).

    The same kind of reasoning might be applied to cats, with "cat" being "the domesticated feline for catching mice", opposed to whatever term the languages had for these small felines -- which, thinking about Greek αἴλουρος and Latin viverra, makes me think that they are under the same label for ferrets, squirrels, and the like. There is a second tree, with cognates such as Italian "micio", Spanish "micho", French "minet", Romenian "mutzu", but this is probably onomatopoeic, or maybe from Latin "musio" (synonym of "cattus" and literally "mouse-catcher" -- maybe a calque?).

    1. Also as non-linguist, usage diversity comes indeed to the mind as an explanation. Particularly, when we compare this with sheep and goat: sheep seems more diversified and they are probably also more domesticated in the sense that they have been breed for providing better wool as well as better meat, more milk. But also there Mattis already found evidence for using the young animal word for the entire organism (again, highlighting the importance of using word families). Overall, it looks like a parallel to dog.

      The cat is essentially a tamed but not really domesticated animal. From a biological sense it makes sense early human societies did not recognise the wild cats as cats, but some other small furry forest hunter.
      Oddly, I have so far not found any paper actually comparing directly the genetic diversity in (domestic) cats vs. dogs. That could be interesting to compare to the linguistic diversity.

    2. There is a paper in there, and one that could make the headlines of popular media ("News at eleven: humans mutated dogs but cats not so much.") ;)

      But seriously, looking forward for the next posts!

  2. By the way, De Vaan (2008) considers that "feles" (the old Latin word for "(wild) cat") might be cognate with Welsh "bele" ("marten"). Felines and Mustelidae in the same semantic field seems a good hypotheses.

    (citation from Wiktionary...) De Vaan, Michiel (2008) Etymological Dictionary of Latin and the other Italic Languages (Leiden Indo-European Etymological Dictionary Series; 7), Leiden, Boston: Brill

    1. Thanks for those comments, Tiago, the more I think about this, the more it becomes obvious that this study should ideally done with word families, instead of fixed concepts. E.g., the "puppy" case you mention in Portuguese seems to be a general pattern across languages, namely, the small beast becomes the larger one (cf. also Engl. "chicken" which is cognate with German "Küken" = "chick"). Many more things to detect, yet the big question remains: how to study those systematically?

    2. Just an idea, but maybe compiling data not from the point of view of concepts, but of terms (the signifiant and not the signifié)? I mean, instead of listing concepts and assigning various cognancy indexes to the various language, you would list clades, i.e., groups of "sibling words stemming from the same ancestor" (English "hound", German "Hund", Danish "hund"... all from Proto-Germanic *hundaz) and assign various concept indexes to the various languages (English HUNTING_ANIMAL, German/Danish DOG)?

      A semantic field such as animal could actually be a good choice for something like this.

    3. That's exactly what I mean by "word family", although without tracing semantics, it would not make sense, as we still want to know how semantics develops...

  3. Indeed, I can't think of a good solution on the spot -- semantics is far too bound by culture to use only linguistic data. Maybe start by manually reviewing colexifications, doing some annotations while thinking about an automatic method?

  4. In Semitic, cat words are diverse, but dog words are not. See John Huehnergard's article, "Qiṭṭa: Arabic cats" (available at his academia.edu page).

    1. That'd be an interesting fit to the genetics/domestication hypotheses pointing towards a high latitude origin of dogs. A thumb rule in population genetics is that the original population is more diverse in haplotypes because each migration comes with bottleneck situations (reduction of maternal lineages). This would be an interesting comparison (in case not yet done).

      A complication is, however, that wild and domestic cats and dog/wolf are fully compatible (in Germany, such hybrids are actively traced and often killed to maintain purity of the wild gene pool). So it's tricky to assess original (domestic) haplotype diversity vs. genetic enrichment by wild-domestic crosses. Maternal admixture is documented in e.g. Driscoll et al.'s 2007 (http://dx.doi.org/10.1126/science.1139518) dataset.

      In the pre-monitored past, at least wild cat genetic markers would have been easily propagated in the domestic cat populations. When visiting the National Park Harz, we were told that the wild-domestic cat hybrids are often drawn to human settlement in contrast to their pure-bred counterparts, which avoid any contact with humans.