Monday, July 30, 2018

Networks of polysemous and homophonous words

When I was very young, maybe even before I went to school, we often played a game with my parents and grandparents, during which we had to select two homophonous words (that is, one word form that expresses two rather different meanings), and the other people had to guess which word we had selected. This game is slightly different from its Anglo-Saxon counterpart, the homophone game.

In Germany, this game is called Teekesselchen: "little teapot". Therefore, people now also use the word Teekesselchen to denote cases of homophonoy or very advanced polysemy. In this sense, the word Teekesselchen itself becomes polysemous, since it denotes both a little teacup, and the phenomenon that word forms in a given language may often denote multiple meanings.

Homophony and polysemy

In linguistics, we learn very early that we should rigorously distinguish the phenomenon of homophony from the phenomenon of polysemy. The former refers to originally different word forms that have become similar (and even identical) due to the effects of sound change — compare French paix "peace" and pet "fart", which are now both pronounced as []. The latter refers to cases where a word form has accumulated multiple meanings over time, which are shifted from the original meaning — compare head as in head of department vs. head as in headache.

Given the difference of the processes leading to homophony on the one hand and polysemy on the other, it may seem justified to opt for a strict usage of the terms, at least when discussing linguistic problems. However, the distinction between homophony and polysemy is not always that easy to make.

In German, for example, we have the same word Decke for "ceiling" and "blanket" (Geyken 2010). This may seem to reflect a homophony at first sight, given that the meanings are so different, so that it seems simpler to assume a coincidence. However, it is in fact a polysemy (cf. Pfeiffer 1993, s. v. «Decke»). This can be easily seen from the verb (be)decken "to cover", from which Decke was derived. While the ceiling covers the room, the blanket covers the body.

Given that we usually do not know much about the history of the words in our languages, we often have difficulties deciding whether we are dealing with homophonies or with polysemies when encountering ambiguous terms in the languges of the world. The problem of the two terms is that they are not descriptive, but explanative (or ontological): they do not only describe a phenomenon ("one word form is ambiguous, having multiple meanings"), but also the origin of this phenomenon (sound change or semantic change).

In this context, the recently coined term colexification (François 2008) has proven to be very helpful, as it is purely descriptive, referring to those cases where a given language has the same word form to express two or more different meanings. The advantage of descriptive terminology is that it allows us to identify a certain phenomenon but analyze it in a separate step — that is, we can already talk about the phenomenon before we have found out its specific explanation.

A new contribution

Having worked hard during recent years writing computer code for data curation and analysis (cf. List et al 2018a), my colleagues and I have finally managed to present the fascinating phenomena of colexifications (homophonies and polysemies) in the languages of the world in an interactive web application. This shows which colexifications occur frequently in which languages of the world.

In order to display how often the languages in the world express different concepts using the same word, we make use of a network model, in which the concepts (or meanings) are represented by the nodes in the networks, and links between concepts are drawn whenever we find that any of the languages in the sample colexifies the concepts. The following figure illustrates this idea.

Colexification network for concepts centering around "FOOD" and "MEAL".

This database and web application is called CLICS, which stands for the Database of Cross-Linguistic Colexifications (List et al. 2018b), and was published officially during the past week ( — it can now be freely accessed by all who are interested. In addition, we describe the database in some more detail in a forthcoming article (List et al. 2018c), which is already available in form of a draft.

The data give us fascinating insights into the way in which the languages of the world describe the world. At times, it is surprising how similar the languages are, even if they do not share any recent ancestry. My favorite example is the network around the concept FUR, shown below. When inspecting this network, one can find direct links of FUR to HAIR, BODY HAIR, and WOOL on one hand, as well as LEATHER, SKIN, BARK, and PEEL on the other. In some sense, the many different languages of the world, whose data was used in this analysis, reflect a general principle of nature, namely that the bodies of living things are often covered by some protective substance.

Colexification network for concepts centering around "FUR".

Although we have been working with these networks for a long time, we are still far from understanding their true potential. Unfortunately, nobody in our team is a true specialist in complex networks. As a result, our approaches are always limited to what we may have read by chance about all of those fascinating ways in which complex networks can be analyzed.

For the future, we hope to convince more colleagues of the interesting character of the data. At the moment, our networks are simple tools for exploration, and it is hard to extract any evolutionary processes from them. With more refined methods, however, it may even be possible to use them to infer general tendencies of semantic change in language evolution.


Geyken A. (ed.) (2010) Digitales Wörterbuch der deutschen Sprache DWDS. Das Wortauskunftssystem zur deutschen Sprache in Geschichte und Gegenwart. Berlin-Brandenburgische Akademie der Wissenschaften: Berlin.

François A. (2008) Semantic maps and the typology of colexification: intertwining polysemous networks across languages. In: Vanhove, M. (ed.) From Polysemy to Semantic Change, pp 163-215. Benjamins: Amsterdam.

List J.-M., M. Walworth, S. Greenhill, T. Tresoldi, R. Forkel (2018) Sequence comparison in computational historical linguistics. Journal of Language Evolution 3.2.

List J.-M., S. Greenhill, C. Anderson, T. Mayer, T. Tresoldi, R. Forkel (forthcoming) CLICS². An improved database of cross-linguistic colexifications: Assembling lexical data with help of cross-linguistic data formats. Linguistic Typology 22.2.

List J.-M., S. Greenhill, C. Anderson, T. Mayer, T. Tresoldi, and R. Forkel (eds.) (2018) CLICS: Database of Cross-Linguistic Colexifications. Max Planck Institute for the Science of Human History: Jena.

Pfeifer W. (1993) Etymologisches Wörterbuch des Deutschen. Akademie: Berlin.


  1. Is there a similar effort anywhere for the aggregation of published cross-linguistic semantic maps?

    1. Not that I would know of, but it would depend on some reference catalog for comparative concepts on grammar (like Martin Haspelmath's idea of establishing a Grammaticon, similar to our Concepticon, Without this, one could not compare across different datasets, as people use so many different terms.

  2. I am all in favour of using "colexification" as a cover term for "polysemy" and "homophony"; indeed, I have also advocated use of an even more general term, "coexpression", in order to make it possible to remain neutral with regard to the question whether a given form whose functions or meanings are under discussion is an independent word or, alternatively, a bound form.

    However, I am puzzled by your claim that "polysemy" and "homophony" are diachronic concepts, pertaining to the ways in which meanings have changed over time. My understanding of these terms is that they are also synchronic in nature, indispensable items in the toolkit of a descriptive linguist, who as such need not necessarily be concerned with diachrony. Specifically, homophony is when a form has two or more distinct meanings that are not related in any way in the mind of the speaker, as inferred by the usual criteria governing the efficacy of linguistic descriptions. Conversely, polysemy is when a form has two or more meanings that are related in the mind of the speaker, as reflected in the optimal synchronic description of the language. To this I would add a third term, "monosemy", describing the case in which a form appears initially, from the perspective of the linguist's theoretical assumptions, or perhaps his or her native language, to have two or more distinct meanings, but which, upon further analysis, turns out to be associated with a single broader meaning unifying the would-be two or more supposed meanings. In fact, I cannot see how a purely synchronic linguistics can do without making the distinction between monosemy, polysemy and homophony.

    Of course, there are typical diachronic paths by which polysemy and homophony arise in languages, and Mattis alludes to two of these. But these are hardly the only possible paths, and crucially, they cannot be definitional of polysemy and homophony respectively. Mattis associates polysemy with diachronic processes of meaning extension, which is surely true, but not necessarily exceptionless. For example, imagine a chain of sequential meaning extensions from A to B to ... all the way to N, in which the intermediate meanings are subsequently lost, say though replacement, and the form in question ends up with just meanings A and N: in the minds of speakers, these will no longer be related, and therefore this will be considered as a case of homophony.

    (In fact, I think I may have recently come across exactly such a case. In a group of Western Malayo-Polynesian languages including Malayic, Kenyah and Malagasy, an allative marker *k has undergone extensions, on the one hand to dative and benefactive, and on the other hand to future. Each stage in the extension connects closely related meanings, however, the end-points in the process are quite distant from each other. Now it so happens than in Malagasy, the form in question, a cognate *hu, expresses future and benefactive but not allative and dative, which are now expressed with different unrelated forms. Thus, in Malagasy, future/benefactive coexpression appears to be a case of synchronic homophony, even though the forms are diachronically related.)

    My point is simply that terms such as "homophony", "polysemy" and "monosemy" are primarily of synchronic nature, just like their cover terms "colexification" and "coexpression". Once this is recognized, we can then study the ways in which such patterns arise through diverse diachronic processes. But that's another story.

    1. Dear David, thanks for these remarks. I completely agree with your synchronic definition of "polysemy" and "homophony", especially, since you make it very concrete, by taking "the speaker" as your reference (this is important, as many scholars tend to omit this, and it makes it extremely difficult to judge). Based on your definition, my example for "Decke" meaning both "cover (sheet)" and "ceiling (of a room)" would then probably reflect polysemy in the speaker of most German speakers' minds. However, in many text books, polysemy is indeed defined in the terms outlined by me, and the definition is historical, explanative, as people take the development as the defining criterion for the distinction. I'd even argue that this is the norm in the definitions that one can find in the literature. If we replaced the diachronic definition with your definition based on speaker judgments (some kind of a diachronic competence by the speakers of a language), I'd be more than glad, as it would be more consistent, but even originally, in Breal's "essais de semantique", where to my knowledge the term "polysemy" was coined, the definition was in fact historical. My blog is sloppy in form of references: I had little time in writing it up, and did not look for proper examples from the literature, and maybe I am even wrong, and most synchronic literature follows your definition. If this is indeed the case, this would even be better, as I am completely in favour of abandoning the diachronic reading of polysemy and homophony, as we can't gain anything useful from it.