Wednesday, November 11, 2015

Networks in Chinese poetry

Structure in Poetry

Dealing with poetry is a dangerous topic in science, since we never know whether the structures we propose are really there or not. Once it comes to the search of structure in poetry, Matthew and Luke were right, since the ones who search will find, provided they have enough creativity.

When I had Latin lessons in school, some of my classmates were incredibly diligent in trying to find alliterations (instances in which words in a sentence start with the same letter) in Cicero's speeches. This was less out of interest in the structure of the speeches, but more an attempt to divert the teacher's attention away from translation.

The problem with structure in poetry is that we never know in the end whether the people who created the poetry did things with purpose or not. Consider, for example, the following lines of a famous verse:

Apart from the fact that people might disagree whether songs by Eminem are poetry, it is interesting to look at the structures one may (or may not) detect. We know that rap and hip hop allow for rather loose rhyming schemes, which may give the impression that they were produced in an ad-hoc manner. We know also that the question of what counts as a rhyme is strictly cultural. In German, for example, employ could rhyme with supply (thanks to Goethe and other poets who would superimpose to the standard language rhyme patterns that made sense in their home dialect). If I was given Eminem's poem in an exam, I would mark its rhyming structure as follows:

I do not know whether any teacher of English would agree that music can rhyme with own it, but if Germans can rhyme [ai] (as in supply) with [ɔi] (as in employ), why not allow [ɪk] (as in music) to rhyme with [ɪt] (as in own it)? I bet that if one made an investigation of all rhymes that Bob Dylan has produced so far, we would find at least a few instances where he would tolerate Eminem's rhyme pattern.

The point here is that rhymes are important evidence to infer how Ancient Chinese was pronounced.

The Pronunciation of Ancient Chinese

The Chinese writing system gives only minimal hints regarding the pronunciation of the characters. If one writes a character like 日 which means 'sun', the writing system gives us no clue as to its pronunciation; and from the modern form in which the character is written, it is also difficult to see the image of a sun in the character. Thus, the current situation in Chinese linguistics is that we have very ancient texts, dating at times back to 1000 BC, but we do not have a real clue as to how the language was pronounced by then.

That it was pronounced differently is clear from — ancient Chinese poetry. When reading ancient poems with modern pronunciations, one often finds rhyme patterns which do not sound nice. Consider the poem from Ode 28 of the Book of Odes (Shījīng 詩經), an ancient collection of poems written between 1050 and 600 BC (translation from Karlgren 1950):

Here, we find modern rhymes between fēi and guī which is fine, since the transliteration fails to give the real pronunciation, which is [fəi] versus [kuəi]; but we also find [in] rhyming with [nan], which is so strange (due to the strong difference in the vowels) that even Bob Dylan and Eminem probably would not tolerate it. But if we do not tolerate this rhyming pattern, and if we do not want to assume that the ancient masters of Chinese poetry would simply fail in rhyming, we need to search for some explanation as to why the words do not rhyme. The explanation is, of course, language evolution — The sound systems of languages constantly change, and if things do not rhyme with our modern pronunciation, they may have been perfect rhymes when they were originally created.

When Chinese scholars of the 16th century, who investigated their ancient poetry, became aware of this, they realized that the poetry could be a clue to reconstruct the ancient pronunciation of their language. Then they began to investigate the ancient poems of the Book of Odes systematically for their rhyme patterns. It is thanks to this work on early linguistic reconstruction by Chinese scholars, that we now have a rather clear picture of how Ancient Chinese was pronounced (see especially Baxter 1992, Sagart 1999, and Baxter and Sagart 2014).

Networks in Chinese Rhyme Patterns

But where are the networks in Chinese poetry, which I promised in the title of this post? They are in the rhyme patterns — It is rather straightforward to model rhyme patterns in poetry with the help of networks. Every node is a distinct word that rhymes in at least one poem with another word. Links between nodes are created whenever one word rhymes with another word in a given stanza of a poem. So, even if we take only two stanzas of two poems of the Book of Odes, we can already create a small network of rhyme transitions, as illustrated in the following figure:

One needs, of course, to be careful when modeling this kind of data, since specific kinds of normalizations are needed to avoid exaggerating the weight assigned to specific rhyme connections. It is possible that poets just used a certain rhyme pattern because they found it somewhere else. It is also not yet entirely clear to me how to best normalize those cases in which more than two words rhyme with each other in the same stanza.

But apart from these rather technical questions, it is quite interesting to look at the patterns that evolve from collecting rhyme patterns of all poems found in the Book of Odes, and plotting them in a network. I prepared such a dataset, using the rhyme assessments by Baxter (1992). The whole data set is now available in the form of an interactive web-application at

In this application, one can browse all characters that appear in potential rhyme positions in all 305 poems that constitute the Book of Odes. Additional meta-data, like reconstructions for the old pronunciations following Baxter and Sagart (2014), which were kindly provided by L. Sagart, have also been added. The core of the app is the "Poem View", by which one can see a poem, along with reconstructions for the rhyme words, and an explicit account of what experts think rhymed in the classical period, and what they think did not rhyme. The following image gives a screanshot of the second poem of the Book of Odes:

But let's now have a look at the big picture of the network we get when taking all words that rhyme into account. The following image was created with Cytoscape:

As we can see, the rhyme words in the 305 poems almost constitute a small world network, and we have a very large connected component. For me, this was quite surprising, since I was assuming that rhyme patterns would be more distinct. It would be very interesting to see a network of the works of Shakespeare or Goethe, and to compare the amount of connectivity.

There are, of course, many things we can do to analyze this network of Chinese poetry, and I am currently trying to find out to what degree this may contribute to the reconstruction of the pronunciation of Ancient Chinese. But since this work is all in a preliminary stage, I will restrict this post by showing how the big network looks if we color the nodes in six different colors, based on which of the six main vowels ([a, e, i, o, u, ə]) scholars usually reconstruct in the rhyme word for Ancient Chinese:

As can be seen, even this simple annotation shows how interesting structures emerge, and how we see more than before.

Many more things can be done with this kind of data. This is for sure. We could compare the rhyme networks of different poets, maybe even the networks of one and the same poet at different stages of their life, asking questions like: "do people rhyme more sloppy, the older they get?" It's a pity that we don't have the data for this, since we lack automatic approaches to detect rhyme words in text, and there are no manual annotations of poem collections apart from the Book of Odes that I know of.

But maybe, one day, we can use networks to study the dynamics underlying the evolution of literature. We could trace the emergence of rap and hip hop, or the impact of the "Judas!"-call on Dylan's rhyme patterns, or the loss of structure in modern poetry. But that's music from the future, of course.

  • Baxter, William H. (1992) A handbook of Old Chinese phonology. Berlin: De Gruyter.
  • Baxter, William H. and Sagart, Laurent (2014) Old Chinese. A new reconstruction. Oxford: Oxford University Press.
  • Karlren, Bernhard (1950) The Book of Odes. Stockholm: Museum of Far Eastern Antiquities.
  • Sagart, Laurent (1999) The roots of Old Chinese. Amsterdam: John Benjamins.


  1. I wonder to what extent your analysis above offers evidence that favors the Baxter-Sagart (2014) reconstruction of Old Chinese phonology.

    You mentioned that you are "currently trying to find out to what degree this may contribute to the reconstruction of the pronunciation of Ancient Chinese." Do you have any updates on whether analyzing rhyming schemes through the use of networks in Old Chinese poetry can contribute substantially to more accurate reconstructions of Old Chinese phonology?

    Have you tested this on any Middle Chinese poetry collections where scholarly reconstructions of phonology are on more solid ground and are less tentative?

  2. The teacher laid this out incorrectly, looks like this:

    You better lose yourself in the music, the moment
    You own it, you better never let it go
    You only get one shot, do not miss your chance to blow
    This opportunity comes once in a lifetime you better

    The moment - you own it appears to be the rhyme's internal...include in that too: better never and the nagging repetitive command "you better".
    Also...take into consideration that this is the spoken word and meant to be as such - rappin' ain't writin'- sound trumps pen. Em lets words (and he pronounces them) and rhythms tumble and bleed into each other.