The Genealogical World of Phylogenetic Networks: From rhymes to networks (A new blog series in six steps)

Whenever one feels stuck in solving a particular problem, it is useful to split this problem into parts, in order to identify exactly where the problems are. The problem that is vexing me at the moment is how to construct a network of rhymes from a set of annotated poems, either by one and the same author, or by many authors who wrote during the same epoch in a certain country using a certain language.

For me, a rhyme network is a network in which words (or parts of words) occur as nodes, and weighted links between the nodes indicate how often the linked words have been found to rhyme in a given corpus

An example

As an example, the following figure illustrates this idea for the case of two Chinese poems, where the rhyme words represented by Chinese characters are linked to form a network (taken from List 2016).

Figure 1: Constructing a network of rhymes in Chinese poetry (List 2016)

One may think that it is silly to make a network from rhymes. However, experiments on Chinese rhyme networks (of which I have reported in the past) have proven to be quite interesting, specifically because they almost always show one large connected component. I find this fascinating, since I would have expected that we would see multiple connected components, representing very distinct rhymes.

It is obvious that some writers don't have a good feeling for rhymes and fail royally when they try to do it — this happens across all languages and cultures in which rhyming plays a role. However, it was much less obvious to me that rhyming can be seen to form at least some kind of a continuum, as you can see from the rhyme networks that we have constructed from Chinese poetry (again) in the past (taken from List et al. 2017).

Figure 2: A complete rhyme network of poems in the Book of Odes (ca. 1000 BC, List et al. 2017)

The current problem

My problem now is that I do not know how to do the same for rhyme collections in other languages. During recent months, I have thought a lot about the problem of constructing rhyme networks for languages such as English or German. However, I always came to a point where I feel stuck, where I realized that I actually did not know at all how to deal with this.

I thought, first, that I could write one blog post listing the problems; but the more I thought about it, I realized that there were so many problems that I could barely do it in one blogpost. So, I decided then that I could just do another series of blog posts (after the nice experience from the series on open problems in computational historical linguistics I posted last year), but this time devoted solely to the question of how one can get from rhymes into networks.

So for the next six months, I will discuss the four major issues that keep me from presenting German or English rhyme networks here and now. I hope that at the end of this discussion I may even have solved the problem, so that I will then be able to present a first rhyme network of Goethe, Shakespeare, or Bob Dylan. (I would not do Eminem, as the rhymes are quite complex, and tedious to annotate).

Summary of the series

Before we can start to think about the modeling of rhyme patterns in rhymed verse, we need to think about the problem in general, and discuss how rhyming shows up in different languages. So, I will start the series with the problem of rhyming in general, by discussing how languages rhyme, where these practices differ, and what we can learn from these differences. Having looked into this, we can think about ways of annotating rhymes in texts in order to acquire a first corpus of examples. So, the following post will deal with the problems that we encounter when trying to annotate the rhyme words that we identify in poetry collections.

If one knows how to annotate something, one will sooner or later get impatient, and long for faster ways to do these boring tasks. Since this also holds for the manual annotation of rhyme collections (which we need for our rhyme networks), it is obvious to think about automated ways of finding rhymes in corpora — that is, to think about the inference of rhyme patterns, which can also be done semi-automatically, of course. So the major problems related to automated rhyme detection will be discussed in a separate post.

Once this is worked out, and one has a reasonably large corpus of rhyme patterns, one wants to analyze it — and the way I want to analyze annotated rhyme corpora is with the help of network models. But, as I mentioned before, I realized that I was stuck when I started to think about rhyme networks of German and English (which are relatively easy languages, one should think). So, it will be important to discuss clearly what seems to be the best way to construct rhyme networks as a first step of analysis. This will therefore be dealt with in a separate blogpost. In a final post, I then plan to tackle the second analysis step, by discussing very briefly what one can do with rhyme networks.

All in all, this makes for six posts (including this one); so we will be busy for the next six months, thinking about rhymes and poetry, which is probably not the worst thing one can do. I hope, but I cannot promise at this point, that this gives me enough time to stick to my ambitious annotation goals, and then present you with a real rhyme network of some poetry collection, other than the Chinese ones I already published in the past.

References

List, Johann-Mattis, Pathmanathan, Jananan Sylvestre, Hill, Nathan W., Bapteste, Eric, Lopez, Philippe (2017) Vowel purity and rhyme evidence in Old Chinese reconstruction. Lingua Sinica 3.1: 1-17.

List, Johann-Mattis (2016) Using network models to analyze Old Chinese rhyme data. Bulletin of Chinese Linguistics 9.2: 218-241.

2 comments:

Guillaume JacquesApril 28, 2020 at 9:05 AM
In the case of the Shijing, whose network you show in Fig. 2, one reason why the network is connected is because the corpus is not a homogenenous language, and includes poems from different periods, reflecting differents phonological systems. Would we find the same in a more homogeneous corpus (let us say, Tang poetry)?
Johann-Mattis ListMay 11, 2020 at 10:06 PM
My ultimate test will be Goethe and his "Faust". But for modern rhymes in Mandarin (dialects), the network shows a similar structure. I think, this can have three reasons: a) all rhyme networks of Chinese and similar SEA languages are connected by and large, b) the Mandarin dataset is also reflecting a lot of dialect variation, c) rhyme networks are usually not connected, when coming from a homogeneous source, but they get connected, once one adds more.

Ah, maybe there's a d) as well: this is language-specific, as not all languages give license to all kinds of rhymes, e.g. bisyllabic rhymes in German with one strong and one weak syllable should never be connected with the mono-syllabic rhymes. So rhyme practice may also play a key role.

But this shows one of the things I am very curious about: if we start looking at rhyme networks from other languages and rhyming traditions: will the patterns change? I think, to some part, yes, but I am not sure about the bigger picture. Ultimately, I hope it could help us to push also our reconstruction of Old Chinese, but this is a long-term goal going beyond this series, of course...

Monday, April 27, 2020

From rhymes to networks (A new blog series in six steps)

2 comments: