The Genealogical World of Phylogenetic Networks: Analyzing rhyme networks (From rhymes to networks 6)

For this, final post of my little series on rhyme networks, I set myself the ambitious goal of providing concrete examples how rhyme networks for languages other than Chinese can be analyzed. Unfortunately, I have to admit that this goal turned out to be a bit too ambitious. Although I managed to create a first corpus of annotated German rhymes, I am still not entirely sure how to construct rhyme networks from this corpus. Even if this problem is solved pragmatically, I realized that the question of how to analyze the rhyme network data is far less straightforward than I originally thought.

I will nevertheless try to end this series by providing a detailed description of how a preliminary rhyme network of the German poetry collection can be analyzed. Since these initial ideas for analysis still have a rather preliminary nature, I hope that they can be sufficiently enhanced in the nearer future.

Constructing directed rhyme networks

I mentioned in last month's post that the it is not ideal to count, as rhyming with each other, all words that are assigned to the same rhyme cluster in a given stanza of a given poem, since this means that one has to normalize the weights of the edges when constructing the rhyme network afterwards (List 2016). I also mentioned the personal communication with Aison Bu, who shared the idea of counting only those rhymes that are somehow close to each other in a stanza.

During this month, I finally found time to think about how to account for this idea in practice, and I came up with a procedure that essentially yields a directed network. In this procedure, we first extract all of the rhyme words in a given stanza in the order of their appearance. We then proceed from the first rhyme word and iterate over the rest of the rhyme words until we find a match. Having found a match, we interrupt the loop and add a directed edge to our rhyme network, which goes from the first rhyme word to its first match. We then delete the first rhyme word from the list and proceed again.

This procedure yields a directed, weighted rhyme network. At first sight, one may not see any specific advantages in the directionality of the network, but in my opinion it does not necessarily hurt; and it is straightforward to convert the network into an undirected one by simply ignoring the directions of the edges and collapsing those which go in two directions in a given pair of rhyme words.

Handling complex rhymes

In last month's blog post, I also mentioned the problem of handling rhymes that stretch across more than one word. While these are properly annotated (in my opinion), I had problems handling them in the rhyme network I presented last week. We find similar problems when working with certain rhymes involving words with more than one syllable. As an example, consider the following words which are all taken from the song Cruisen, and which I further represent in syllabified form in phonetic transcription.

Rhyme Words	Stressed Syllable	Unstressed Syllable
Tube	tuː	bə
Bude	buː	də
Gurke	guɐ	kə
hupe	huː	kə
Kurve	kuɐ	və
Schurke	ʃuɐ	kə
Punkte	puŋ	tə

These words do not rhyme according to traditional poetry rules (where unstressed syllables following stressed syllables need to be identical), but they do reflect a common rhyme tendency in German Hip Hop, where rhyme practice has been evolving lately. In order to properly account for this, I assigned both the first and the second syllable of the words to their own rhyme group (one stressed syllable rhyme and one unstressed syllable rhyme).

When constructing the rhyme network, however, the separation into two rhyme groups turned out to not make much sense any longer, since the rhymes occur on a sub-morphemic level, where the parts to not themselves express a meaning anymore. To cope with this, I modified the network code slightly by treating only those words as rhyming with each other which show identical rhyme groups in all of their syllables.

Infomap communities and connected components

Having constructed the rhyme network in this new way, we can start with some preliminary analyses. As a first step, it is useful to check the general characteristics of the network. When using the new approach for network construction and the correction for complex rhymes, as reported above, the network consists of 3,104 nodes which together occur as many as 7,707 times. The network itself is only sparsely connected, being separated into 840 connected components.

As a first and very straightforward analysis, I used the Infomap algorithm (Rosvall and Bergstrom 2008) to see whether the connected components could be split any further. This analysis resulted in 932 communities, indicating that quite a few of the larger connected components in the rhyme network seem to show an additional community structure.

Unfortunately, I have not had time for a complete revision of all of the communities, but when checking a few of the larger connected components that were later separated into several communities, it seemed that most of these cases are due to very infrequent rhymes that are only licensed in very specific situations. As an example, consider the figure below, in which a larger connected component is shown along with the three communities identified by the Infomap algorithm.

The three communities, marked by the color of the nodes in the network, reflect three basic German rhyme patterns, which we can label -ung, -um, and -und. Transitions between the communities are sparse, although they are surely licensed by the phonetic similarity of the rhyme patterns, since they share the same main vowel and only differ by their finals, which all show a nasal component. The Infomap analysis assigns the nodes rum and krumm wrongly to the -und pattern but, given how sparse the graph is (with weights of one occurrence only for all of the edges), it is not surprising that this can happen. Both instances where edges connect the communities are rhymes occurring in the same Hip Hop lyrics from the song Geschichten aus der Nachbarschaft, as can be seen from the following annotated line of the song.

Judging from quickly eye-balling the data, most of the communities that further split the connected components of the network reflect groups of very closely rhyming words (usually corresponding to what one might call perfect rhymes). Links between communities reflect either possible similarities between the rhyme words represented by the communities, or direct errors introduced by my encoding.

Unfortunately, I could not find time to further elaborate on this analysis. What would be interesting to do, for example, would be a phonetic alignment analysis of the communities, with the goal of identifying the most general sound sequence that might represent a given community. It would also help to measure to what degree transitions between communities conform to these patterns, or to what degree individual words might reflect the communities' consensus rhyming more or less closely.

But even the brief analysis here has shown me that, first, there are still many errors in my annotation, and, second, the Infomap algorithm for community detection seems to work just as well with German rhyme data as it works on Chinese rhyme data.

Frequent rhyme pairs and promiscuous rhyme words

As a last example of how rhyme networks can be analyzed, I want to have a look at frequently recurring patterns in the current poetry collection. A very simple first test we can do in this regard is to look at the edges with the highest weights in our networks. Poets typically try to be very original in their work, since nothing is considered as boring as repetition in the literature. Nevertheless, since the pool of words from which poets can choose when creating their poems is, by nature, limited, there are always patterns that are more frequently used.

The following table shows those directed rhymes that occur most frequently in the German poetry database.

Rhyme Part A	Rhyme Part B	No. of Poems
sein	lein	10
aus	haus	10
haus	aus	9
triebe	liebe	9
leben	geben	9
geben	leben	9
zeit	keit	9
nein	sein	8
wieder	lieder	7
nur	tur	7

This collection may not tell you too much, if you are not a native speaker of German. But if you are, then you will easily see that most of these rhymes are very common, involving either very common words (sein "to be"), or suffixes that frequently recur in different words of the German lexicon (-lein either as diminutive suffix or as part of allein "alone"). We also find the very sad match of liebe (Liebe "love") and triebe (Triebe "urges"), which is mostly thanks to the poems by Rainer Maria Rilke (1875-1926), who wrote a lot about "love", and had the same problem as most German poets: there are not many words rhyming nicely with Liebe (the only other candidates I know of would be bliebe "would stay" and Hiebe "stroke or blow").

As a last example, we can consider promiscuous rhyme words, that is, rhyme words that tend to be reused in many poems with many other words as partners. The following table shows the top ten in terms of rhyme promiscuity in the German poetry dataset.

Rhyme Part	Rhyme Partners	Occurrences
sein	14	87
ein	9	34
bei	9	36
sagen	8	19
leben	8	39
schein	8	26
mehr	8	25
nicht		8
zeit	8	36
welt	7	32

Here, I find it rather interesting that we find so many words rhyming with -ein in this short list. However, when checking the community of -ein, we can see that there is, indeed, a rather large number of words from which one can choose (including basic words like Bein "leg", Schein "shine", Stein "stone"). Additionally, there are a larger number of verbs of the form -eien that are traditionally shortened in colloquial speech (compare the node schreien "to scream").

Concluding remarks

When I started this series on rhyme networks, I was hoping to achieve more in the six months that I had ahead. In the light of my initial hopes, the analyses I have shown here are somewhat disappointing. However, even if I could not keep the promises I made to myself, I have learned a lot during these months, and I remain optimistic that many of the still untackled problems can be solved in the near future. What today's analysis has specifically shown to me, however, is that more data will be needed, since the network produced from the small collection of 300 German poems is clearly too small to serve for a fully fledged analysis of rhymes in German poetry.

References

List, Johann-Mattis (2016) Using network models to analyze Old Chinese rhyme data. Bulletin of Chinese Linguistics 9.2: 218-241.

Rosvall, M. and Bergstrom, C. T. (2008) Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105.4: 1118-1123.

Data and Code

Data and code are available in the form of a GitHub Gist.

Monday, September 28, 2020

Analyzing rhyme networks (From rhymes to networks 6)

No comments:

Post a Comment