Monday, August 24, 2020

Constructing rhyme networks (From rhymes to networks 5)


As is now happening for the summer, this little series on rhyme networks is also coming to its end. We have only two more blog posts to go, with this one discussing the construction of rhyme networks, and then one more post in September, discussing how rhyme networks can be analyzed.

A preliminary annotated collection of rhymed poetry in German

While my original plan was to have all of Goethe's Faust annotated by the end of this series, so that I could illustrate how to make rhyme analyses with a large dataset of rhyme patterns in a language other than Chinese, I now have to admit that this plan was way too ambitious.

Nevertheless, I have managed to assemble a larger collection of German rhymes from various pieces of literature, ranging from boring love poems to recent examples of German Hip-Hop; and all of the rhymes have been manually annotated by myself during recent months.

This little corpus currently consists of 336 German "œuvres" (the data collection itself has more poems and songs from different languages), which make up a total of 1,544 stanzas (deliberately excluding the refrains in songs). There are 3,950 words that rhyme in this collection; and together they occur 5,438 times in a total of 49,797 words written by 72 different authors. The following table summarizes major features of the German part of the database.

Aspect Score
components 994
authors 72
poems 336
stanzas 1544
lines 8340
rhyme words 3950
words rhyming   5438
words total 49797

The whole collection, which is currently available under the working title "AntRhyme: Annotated Rhyme Database", can be inspected online at https://digling.org/rhyant/, but due to copyright restrictions for texts from recent pop songs, not all of the poems can be displayed. In order to share the annotated rhymes along with the initial Python code that I wrote for this post, I have therefore created a version in which only the annotated rhyme words are provided, along with dummy words in which each character was replaced by a miscellaneous symbol. As a result, the song "Griechicher Wein" ("Greek wine") by Udo Jürgens from 1974 now looks as shown in the following figure.


Modeling rhymes with networks

As far as Chinese rhyme networks were concerned, I have always given the impression (and also truly thought this myself) that the reconstruction of a rhyme network is something rather trivial. Given a stanza in a given poem, all one has to do is to model the rhyme words in the stanza as nodes in the network, and then add connections for all of the words that rhyme with each other according to the annotation.

While I still think that this simple rhyme network model is a very good starting point, there are certain non-trivial aspects that one needs to carefully consider when working with this kind of rhyme network. First, there is the question of weighting. In the first study that I devoted to Old Chinese poetry (List 2016), I weighted the nodes by counting their appearance, and I also weighted the edges by first counting how often they occurred. I then normalized this score in order to receive a more balanced weighting. The normalization would first count each rhyme pair only once, even if the same word occurred more than one time in the same stanza, and then apply a formula for normalization based on the number of words rhyming with each other within the same stanza (see ibid. 228 for details).

However, in the meantime, a young scholar Aison Bu has suggested an even better way of counting rhymes, in an email conversation with me. [The pandemic prevented us meeting in person at a conference in early April, so we could never follow this up.] Since rhyming is essentially linear, my original counting of all rhymes that are assigned to the same rhyme partition in a given stanza may essentially be misleading. Instead, Aison suggested counting only adjacent rhymes.

To provide a concrete example, consider the third stanza in the song "Griechischer Wein" by Udo Jürgens (shown above). Here, we have the rhyme group labeled as f, which occurs three times in the data, with the rhyme words Wind (wind), sind (they are), and Kind (child). The normalization procedure that I proposed in the study from 2016 would now construct a network in which all three words rhyme with each other. To normalize the edge weights, each individual edge weight would be modified by the factor 1 / (G-1), where G is the number of rhymes in the rhyme group in the stanza (3 in this case, as we have three words rhyming with each other). Aison's rhyme network construction, however, would only add two edges, one for Wind and sind, and one for sind and Kind, as they immediately follow each other in the verse. A specific normalization of the edge weights would not be needed in this case.

A first rhyme network

Unfortunately, I have not had time so far to test Aison's idea, to draw only edges for adjacent rhymes when constructing rhyme networks. However, with the data for more than 300 German poems and songs assembled, I have had enough time to construct a first and very simple network of German rhyme data.

For this network, I disregarded all normalization issues, and just added an edge for each pair of words that would have been assigned to the same rhyme group in my rhyme annotation. This network resulted in a rather sparse collection of 994 connected components. This is in strong contrast to the Chinese poems I have analyzed in the past (List 2016, List 2020), which were all very close to small-world networks, with one huge connected component, and very few additional components. However, it would be too early to conclude that German rhyme networks are fundamentally different from Chinese ones, given that the data may just be too sparse for this kind of experiment.

At this stage of the analysis, it is therefore important to carefully inspect the networks, in order to explore to what degree the network modeling or the data annotation could be further improved. When looking at the largest connected component, shown in the following figure, for example, it is clear that typical rhyme groups that we would expect to find separated in rhyme dictionaries do cluster together. We find -aut on the left, -aus and -auf on the right, with the word auch (also) as a very central rhyme word, as well as Frau (woman).




While these words can be defended as rhymes, given that they share the diphthong au, we also find some strange matches. Among these is as the cluster with -ut on the bottom left, which links via Mut (courage) to Bauch (belly) and resolut (straightforward). Another example is the link between Frau and trauern (mourn). The former link is due to an annotation error in the poem "Freundesbrief an einen Melancholischen" ("Friendly letter to a melancholic") by Otto Julius Bierbaum (1921), where I wrongly annotated Bauch and auch to rhyme with resolut and Mut.

However, the second example is due to a modeling problem with rhymes that encompass more than one word. This pattern is very frequent in Hip-Hop texts, and I have not yet found a good way of handling it. In the case of Frau rhyming with trauern, the original text rhymes trauern with Frau an, the latter being a part of the sentence "schaut euch diese Frau an" ("look at this woman"). Since my conversion of the text to rhyme networks only considers the first part of multi-word rhymes as the word under question, it obviously mistakenly displays the rhyme, which is also show in its original form in the figure below.


Conclusion

The initial construction of German rhyme networks which I have presented in this post has shown some potential problems in the conversion of rhyme judgments to rhyme networks. First, we have to count with certain errors in the annotation (which seem to be inevitable when doing things manually). Second, certain aspects of the annotation, especially rhymes stretching over more than one word, need to be handled more properly. Third, assuming that poetry is spoken, and spoken texts are realized in linear form, it may be useful to reconsider the current rhyme network construction, by which edges for rhyme examples are added for all possible combinations of rhyme words occuring in the same rhyme group. For the final post in this series next month, I hope that I will find time to address all of these problems in a satisfying way.

References

List, Johann-Mattis (2016) Using network models to analyze Old Chinese rhyme data. Bulletin of Chinese Linguistics 9.2: 218-241.

List, Johann-Mattis (2020) Improving data handling and analysis in the study of rhyme patterns. Cahiers de Linguistique Asie Orientale 49.1: 43-57.

For those of you interested in data and code that I used in this study, you can find them in this GitHub Gist.

No comments:

Post a Comment