Tuesday, November 28, 2017

“Man gave names to all those animals”: goats and sheep

This is a joint post by Guido Grimm, Johann-Mattis List, and Cormac Anderson.

This is the second of a pair of posts dealing with the names of domesticated animals. In the first part, we looked at the peculiar differences in the names we use for cats and dogs, two of humanity’s most beloved domesticated predators. In this, the second part (and with some help from Cormac Anderson, a fellow linguist from the Max Planck Institute for the Science of Human History), we’ll look at two widely cultivated and early-domesticated herbivores: goats and sheep.

Similar origins, but not the same

Both goats and sheep are domesticated animals that have an explicitly economic use; and, in both cases, genetic and archaeological evidence points to the Near East as the place of domestication (Naderi et al. 2007). The main difference between the two is the natural distribution of goats (providing nourishment and leather) and sheep (providing the same plus wool). This distribution is also reflected in the phonetic (dis)similarities of the terms used in our sample of languages (Figures 1 and 2).

Capra aegagrus, the species from which the domestic goat derives, is native to the Fertile Crescent and Iran. Other species of the genus, similar to the goat in appearance, are restricted to fairly inaccessible areas of the mountains of western Eurasia (see Figure 3, taken from Driscoll et al. 2009). On the other hand, Ovis aries, the sheep and its non-domesticated sister species, are found in hilly and mountainous areas throughout the temperate and boreal zone of the Northern Hemisphere. Whenever humans migrated into mountainous areas, there was the likelihood of finding a beast that:
Had wool on his back and hooves on his feet,
Eating grass on a mountainside so steep
[Bob Dylan: Man Gave Names to all those animals].

Goats were actively propagated by humans into every corner of the world, because they can thrive even in quite inhospitable areas. Reflecting this, differences in the terms for "goat" generally follow the main subgroups of the Indo-European language family (Figure 1), in contrast to "cat", "dog", and "sheep". From the language data, it seems that for the most part each major language expansion, as reflected in the subgroups of Indo-European languages, brought its own term for "goat", and that it was rarely modified too much or borrowed from other speech communities.

There is one exception to this, however. The terms in the Italic and Celtic languages look as though they are related, coming from the same Proto-Indo-European root, *kapr-, although the initial /g/ in the Celtic languages is not regular. In Irish and Scottish Gaelic, the words for "sheep" also come from the same root. In other cases, roots that are attested in one or other language have more restricted meanings in some other language; for example, the Indo-Iranic words for goat are cognate with the English buck, used to designate a male goat (or sometimes the male of other hooved animals, such as deer).

The German word Ziege sticks out from the Germanic form gait- (but note the Austro-Bavarian Goaß, and the alternative term Geiß, particularly in southern German dialects). The origin of the German term is not (yet) known, but it is clear that it was already present in the Old High German period (8th century CE), although it was not until Luther's translation of the Bible, in which he used the word, that the word became the norm and successively replaced the older forms in other varieties of Germany (Pfeifer 1993: s. v. "Ziege").

Figure 1: Phonetic comparison of words for "goat"


The terms for sheep, however, are often phonetically very different even in related languages. The overall pattern seems to be more similar to that of the words for dog – the animal used to herd sheep and protect them from wolves. An interesting parallel is the phonetic similarity between the Danish and Swedish forms får (a word not known in other Germanic languages) and the Indic languages. This similarity is a pure coincidence, as the Scandinavian forms go back to a form fahaz- (Kroonen 2013: 122), which can be further related to Latin pecus "cattle" (ibd.) and is reflected in Italian [pɛːkora] in our sample.

This example clearly shows the limitations of pure phonetic comparisons when searching for historical signal in linguistics. Latin c (pronounced as [k]) is usually reflected as an h in Germanic languages, reflecting a frequent and regular sound change. The sound [h] itself can be easily lost, and the [z] became a [r] in many Scandinavian words. The fact that both Italian and Danish plus Swedish have cognate terms for "sheep", however, does not mean that their common ancestors used the same term. It is much more likely that speakers in both communities came up with similar ways to name their most important herded animals. It is possible, for example, that this term generically meant "livestock", and that the sheep was the most prototypical representative at a certain time in both ancestral societies.

Furthermore, we see substantial phonetic variation in the Romance languages surrounding the Mediterranean, where both sheep and goats have probably been cultivated since the dawn of human civilization. Each language uses a different word for sheep, with only the Western Romance languages being visibly similar to ovis, their ancestral word in Latin, while Italian and French show new terms.

Figure 2: Phonetic comparison of words for "sheep"

More interesting aspects

The wild sheep, found in hilly and mountainous areas across western Eurasia, was probably hunted for its wool long before mouflons (a subspecies of the wild sheep) were domesticated and kept as livestock. The word for "sheep" in Indo-European, which we can safely reconstruct, was h owis, possibly pronounced as [xovis], and still reflected in Spanish, Portuguese, Romanian, Russian, Polish. It survives in many more languages as a specific term with a different meaning, addressing the milk-bearing / birthing female sheep. These include English ewe, Faroean ær (which comes in more than a dozen combinations; Faroes literally means: “sheep islands”), French brebis (important to known when you want sheep-milk based cheese), German Aue (extremely rare nowadays, having been replaced by Mutterschaf "mother-sheep"). In other languages it has been lost completely.

What is interesting in this context is that while the phonetic similarity of the terms for "sheep" resembles the pattern we observe for "dog", the history of the words is quite different. While the words for "dog" just continued in different language lineages, and thus developed independently in different groups without being replaced by other terms, the words for "sheep" show much more frequent replacement patterns. This also contrasts with the terms for "goat", which are all of much more recent origin in the different subgroups of Indo-European, and have remained rather similar after they were first introduced.

The reasons for these different patterns of animal terms are manifold, and a single explanation may never capture them all. One general clue with some explanatory power, however, may be how and by whom the animals were used. Humans, in particular nomadic societies, rely on goats to colonize or survive in unfortunate environments, even into historic times. For instance, goats were introduced to South Africa by European settlers to effectively eat up the thicket growing in the interior of the Eastern Cape Province. Once the thicket was gone, the fields were then used for herding cattle and sheep.

Figure 3: Map from Driscoll et al. (2009)

There are other interesting aspects of the plot.

For example, as mentioned before, in Chinese the goat refers to the "mountain sheep/goat" and the "sheep/goat" is the "soft sheep". While it is straightforward to assume that yáng, the term for "sheep/goat", originally only denoted one of the two organisms, either the sheep or the goat, it is difficult to say which came first. The term yáng itself is very old, as can also be seen from the Chinese character used, which serves as one of the base radicals of the writing system, depicting an animal with horns: . The sheep seems to have arrived in China rather early (Dodson et al. 2014), predating the invention of writing, while the arrival of the goat was also rather ancient (Wei et al. 2014) (and might also have happened more than once). Whether sheep arrived before goats in China, or vice versa, could probably be tested by haplotyping feral and locally bred populations while recording the local names and establishing the similarity of words for goat and sheep.

While the similar names for goat and sheep may be surprising at first sight (given that the animals do not look all that similar), the similarity is reflected in quite a few of the world's languages, as can be seen from the Database of Cross-Linguistic Colexifications (List et al. 2014) where both terms form a cluster.

Source Code and Data

We have uploaded source code and data to Zenodo, where you can download them and carry out the tests yourself (DOI: 10.5281/zenodo.1066534). Great thanks goes to Gerhard Jäger (Eberhard-Karls University Tübingen), who provided us with the pairwise language distances computed for his 2015 paper on "Support for linguistic macro-families from weighted sequence alignment" (DOI: 10.1073/pnas.1500331112).

Final remark

As in the case of cats and dogs, we have reported here merely preliminary impressions, through which we hope to encourage potential readers to delve into the puzzling world of naming those animals that were instrumental for the development of human societies. In case you know more about these topics than we have reported here, please get in touch with us, we will be glad to learn more.

  • Dodson, J., E. Dodson, R. Banati, X. Li, P. Atahan, S. Hu, R. Middleton, X. Zhou, and S. Nan (2014) Oldest directly dated remains of sheep in China. Sci Rep 4: 7170.
  • Driscoll, C., D. Macdonald, and S. O’Brien (2009) From wild animals to domestic pets, an evolutionary view of domestication. Proceedings of the National Academy of Sciences 106 Suppl 1: 9971-9978.
  • Jäger, G. (2015) Support for linguistic macrofamilies from weighted alignment. Proceedings of the National Academy of Sciences 112.41: 12752–12757.
  • Kroonen, G. (2013) Etymological dictionary of Proto-Germanic. Brill: Leiden and Boston.
  • List, J.-M., T. Mayer, A. Terhalle, and M. Urban (eds) (2014) CLICS: Database of Cross-Linguistic Colexifications. Forschungszentrum Deutscher Sprachatlas: Marburg.
  • Naderi, S., H. Rezaei, P. Taberlet, S. Zundel, S. Rafat, H. Naghash, et al. (2007) Large-scale mitochondrial DNA analysis of the domestic goat reveals six haplogroups with high diversity. PLoS One 2.10. e1012.
  • Pfeifer, W. (1993) Etymologisches Wörterbuch des Deutschen. Akademie: Berlin.
  • Wei, C., J. Lu, L. Xu, G. Liu, Z. Wang, F. Zhao, L. Zhang, X. Han, L. Du, and C. Liu (2014) Genetic structure of Chinese indigenous goats and the special geographical structure in the Southwest China as a geographic barrier driving the fragmentation of a large population. PLoS One 9.4: e94435.

Tuesday, November 21, 2017

Another test case for phylogenetics and textual criticism: the Bible

This is a two-part blog post. Here, I will introduce a particular stemmatological problem, along with the studies of it to date; and in a subsequent post I will discuss possible phylogenetic analyses that might be applied (see The Synoptic Gospels problem: preparing a phylogenetic approach).


This year marks the celebration of 500 years since Martin Luther famously proposed his 95 religious theses, thus presaging the Protestant Reformation of the Western Christian Church. In line with this, it is worth discussing a subfield of textual criticism and stemmatics deeply influenced by the Reformation: Biblical criticism. While the importance of written texts to Christianity begins at least in the 2nd century, the theological doctrine of the sola fide (“by scripture alone”, regarding the infallible and final authority in all matters), along with translation work and individual study of the Bible, paved the way, sometimes unwillingly, to scientific approaches of Biblical criticism equivalent to those of secular literature.

The seminal figure in textual criticism of the New Testament was Hermann Reimarus (1694-1768), apparently the first to apply the methodology of literary texts to religious ones. As in the case of literary criticism, it is hardly a coincidence that Biblical criticism developed in the same cultural framework that would support and promote the idea of biological evolution and the tools for establishing genealogical trees and networks. This is especially so when considering the secularization of that society, in which proving the human origin and transmission of sacred texts was deemed an important act of civic freedom. Along with this was the parallel radicalization of some religious positions, such as denouncement as heresy of scientific studies of religious texts (nowadays objected to by most Christian doctrines that stated the imperative of serious research on the sacred texts).

A concrete problem: the synoptic gospels

The most important problem in the textual criticism of the New Testament is the “synoptic gospels" one, involving the three Gospels of Mark, Matthew, and Luke. These gospels have strikingly similar narratives that relate many of the same stories, with similar or identical wording. Like the other canonical gospel, John, these texts were composed around the last quarter of the first century by literate Greek-speaking Christians, only becoming canonical at least a century after their composition.

The synoptic gospels differ from similar sources, such as the non-canonical Gospel of Thomas, in being biographies with a clear religious motivation, and not just a collection of sayings. When compared to the Gospel of John, the three synoptic gospels are distinct in apparently being written by and for a Jewish community that was not on the verge of breaking from the Jewish synagogue, also favoring short and simple sentences.

However, the most important proof of their genealogical relationship is the text itself. The table below shows the reconstructed Greek original of each gospel for the episode of Jesus’ recruitment of a tax collector (an episode missing from the non-synoptic Gospel of John). The text in blue is the material shared by any two of the gospels, and the text in red is common to all three of them. [This is adapted from Smith (2017); on Wikipedia there is a further example, referring to the episode of the cleansing of a leper, see https://en.wikipedia.org/wiki/Synoptic_Gospels#Example.]

Matthew 9,9

Mark 2,13-14

Luke 5, 27-28
Καὶ παράγων ὁ Ἰησοῦς ἐκεῖθεν εἶδεν ἄνθρωπον καθήμενον ἐπὶ τὸ τελώνιον, Μαθθαῖον λεγόμενον, καὶ λέγει αὐτῷ· Ἀκολούθει μοι. καὶ ἀναστὰς ἠκολούθ ησεν αὐτῷ. Καὶ ἐξῆλθεν πάλιν παρὰ τὴν θάλασσαν· καὶ πᾶς ὁ ὄχλος ἤρχετο πρὸς αὐτόν, καὶ ἐδίδασκεν αὐτούς. καὶ παράγων εἶδεν Λευὶν τὸν τοῦ Ἁλφαίου καθήμενον ἐπὶ τὸ τελώνιον, καὶ λέγει αὐτῷ· Ἀκολούθει μοι. καὶ ἀναστὰς ἠκολούθ ησεν αὐτῷ. Καὶ μετὰ ταῦτα ἐξῆλθεν καὶ ἐθεάσατο τελώνην ὀνόματι Λευὶν καθήμενον ἐπὶ τὸ τελώνιον, καὶ εἶπεν αὐτῷ· Ἀκολούθει μοι. καὶ καταλιπὼν πάντα ἀναστὰς ἠκολούθ ει αὐτῷ.

The relationships between the gospels, such as the so-called “triple tradition”, is summarized by the graph below, from the Wikipedia article on the synoptic gospels. Mark, the shortest text, has almost no unique material (only 3%, in part superfluous adjectives and Aramaic translations) and is almost entirely (94%) reproduced in Luke. Matthew and Luke have their share of unique material (20% and 35%, respectively), which suggests independence, except for a "double tradition" of common material of about a quarter of the contents of each one, including notable passages such as the “Sermon of the Mount”. The parallelisms of these two gospels are found not only in their contents, but also in their arrangement, with most episodes described in the same order and, in case of displacements, with blocks of episodes moved together while preserving their internal order.

Previous studies

Such similarities were already noted in the first centuries of Christianity. This raises typical genealogical questions regarding topics such as priority (which gospel was written first) and dependence (which gospel was used as a source).

As for the first question, due to textual and theological evidence, a well-established majority of commentators favors the hypothesis of Marcan priority — that is, that the gospel of Mark is the oldest, and both Matthew and Luke used it as a source. As for the second question, a major point of dispute is the double tradition of Matthew and Luke, which can only be properly explained in terms either of descent or of a common ancestor. The two leading hypotheses are the one of a lost gospel (referred as “Q”, after the German Quelle [“source”]), and the one by Austin Farrer, according to whom Matthew used Mark as its source and Luke then used both of them. But these are not the only hypotheses that have been proposed, as shown in the next set of diagrams (also from the Wikipedia article above).

Augustinian Theory
Q Hypothesis
Farrer Theory
Jerusalem School Hypothesis

The first fully developed theory was actually proposed by Augustine of Hippo back in the 5th century, which is essentially the one by Farrer, but with Matthew in place of Mark (i.e., supporting a Matthean priority). Given Augustine’s authority as a “Father of the Church”, his view was not disputed until the late 18th century, when Johann Jakob Griesbach published a synopsis of the three gospels and developed a new hypothesis, swapping Mark and Luke in the dominant explanation. Griesbach’s scientific approach led to the first application to Biblical problems of textual criticism, then in development in the German towns of Jena and Leipzig where he lived.

In 1838, Christian Weisse proposed the “Q” Hypothesis, mentioned above, asserting that Matthew and Luke were produced independently, both using Mark plus a lost source. This source was described as a lost collection of sayings of Jesus, along with feeble indirect evidence of its existence. This hypothesis was further developed by Burnett Streeter in 1924, with the proposal of “proto-versions” of both Mark and Luke — the wording of the canonical versions we have today would then be the product of later revisions, influenced by all of the texts.

During the past fifty years, due to advances in textual criticism and new manuscript analyses, the independence of Luke in relation to Matthew has been questioned, with diminishing support for the Q Hypothesis. A now leading position holds for Farrer’s hypothesis, along with alternative trees such as the one by the Jerusalem School, according to which a lost Greek anthology “A” (postulated as the translation of a collection of sayings either in Hebrew or in Aramaic) was directly or indirectly used by all gospels, including John.


Considering the analogies between literary and genetic texts that we have already discussed on this blog, it is clear that this topic should be an interesting anecdote to share around phylogenetic water-coolers. The four texts can be divided into two “families” of gospels, the synoptic (taxa: Matthew, Mark, and Luke) and non-synoptic (taxon: John). Their similarities suggest a distant common ancestor, probably oral traditions, as reported by Christian writers of the first and second centuries such as Papias.

The relationship between the taxa of the first family, however, is far from clear, as their relative dates cannot be determined with confidence. We might be faced with processes that, by analogy with biology, can be explained as gene pool recombinations and horizontal gene transfers – even though the most likely explanation is the one of direct descent, possibly from unknown taxa.

In literary terms, we must also consider features such as Matthew clearly being written by someone highly familiar with aspects of Jewish law, possibly asserting the Jewish component of the preaching while perceiving a universal tendency for the new faith. We must also consider the fact that Mark provides no ancestral lineage for Jesus, while Matthew traces him from a line of kings and Luke from a line of commoners — clearly stating the theological point of view of each gospel. Other aspects are worth consideration, such as the idea that what we today identify as the Gospel of Luke is likely to have been the first part of a once single document that included what is now the book of the “Acts of the Apostles”.

While I must admit that my research has been limited to some googling of keywords, it is curious that a topic that has attracted so much attention for millennia, from serious academic scholarship to conspiracy theories, and from impressionistic reviews to advanced statistical modeling, does not seem to have been covered by phylogenetic analyses, so far. Given the range of data and literature, it should actually look like a prime candidate for such application, even from an outsider point of view. This viewpoint is in fact discussed in a review by Christian P. Robert of a book called The Synoptic Problem and Statistics by Andry Abakuks:
The book by Abakuks goes […] through several modelling directions, from logistic regression using variable length Markov chains [to predict agreement between two of the three texts by regressing on earlier agreement] to hidden Markov models [representing, e.g., Matthew’s use of Mark], to various independence tests on contingency tables, sometimes bringing into the model an extra source denoted by Q. Including some R code for hidden Markov models. Once again, from my outsider viewpoint, this fragmented approach to the problem sounds problematic and inconclusive. And rather verbose in extensive discussions of descriptive statistics. Not that I was expecting a sudden Monty Python-like ray of light and booming voice to disclose the truth! Or that I crave for more p-values (some may be found hiding within the book). But I still wonder about the phylogeny… Especially since phylogenies are used in text authentication as pointed out to me by Robin Ryder for Chauncer’s [sic] Canterbury Tales.
We can certainly list among the reasons for such omission the diffidence of the textual community towards phylogenetic methods, especially when performed by people from outside the field; but the potential reception problems for texts of enormous religious significance cannot be ruled out. However, one of the reasons might be far more trivial: the fact that, just as in the case of historical linguistics, we don’t have digital structured databases of the trove of data about this topic. Most of the literature is not even properly digital, at best with scanned PDFs. Furthermore, the data are usually far from perfect for such usage, as in the case of the synopsis by Smith (2017), which looks more like a typed table than a true database.


In a future post, I will explore the problems of the synoptic gospels from a phylogenetic point of view, also releasing a minimal dataset (see The Synoptic Gospels problem: preparing a phylogenetic approach). Until then, those interested in the topic can find a lot of discussion on a mailing list devoted to the scholarly study of the synoptic gospels, Synoptic-L.


Abakuks, Andris (2014) The Synoptic Problem and Statistics. London: Chapman and Hall / CRC.

Goodacre, Mark (2001) The Synoptic Problem: a Way Through the Maze. New York: T & T Clark International. (available on Archive.org)

Robert, Christian P (2015) The synoptic problem and statistics [book review]. https://xianblog.wordpress.com/2015/03/20/the-synoptic-problem-and-statistics-book-review/

Orchard, Bernard; Longstaff, Thomas RW (1979) J.J. Griesbach: Synoptic and Text - Critical Studies 1776-1976. Cambridge: Cambridge University Press.

Smith, Mahlon H (2017) A Synoptic Gospels Primer. http://virtualreligion.net/primer/

Tuesday, November 14, 2017

Power laws and cryptocurrencies

The Power Law is used to describe phenomena where large occurrences are rare but small ones are quite common. For example, there are few billionaires while most people make only a modest income; there are few large cities but many small towns; there are few very frequent words but many rare words.

Mathematically, Power Laws are of interest because of what is known as "scale invariance", as well as the fact that there is no well-defined average value. Furthermore, Power Laws are considered to be universal — you can read about this in Wikipedia. One of the more obvious places that we might expect to find them is in the exchange rates of currencies (their "worth") — there will be a few of great worth (the "major currencies") and lots of lesser worth.

For example, I recently read the headline: Bitcoin isn't "too expensive", says BTCC boss Bobby Lee. He was defending the price of the digital currency Bitcoin, which has increased in value more than 600 percent this year, claiming that this is not evidence of a financial bubble, but instead is evidence that the currency is proving its utility in the digital world. Obviously, I cannot let this claim pass without turning a quantitative eye upon it.


Bitcoin is the original cryptocurrency, established in 2009, just after the financial crash of that time. It is a digital currency, which by design has no central bank or regulatory authority supporting it. The coins don’t exist in a tangible form, but instead exist solely in a digital "wallet". Nevertheless, they can still be exchanged and used in transactions, just as with any fiat currency.

Bitcoin is based on a technology now referred to as the blockchain, which seriously has the potential to redefine future economic and legal transactions. Indeed, it is the blockchain idea that has proven to be of interest to financial and legal institutions, not the currency itself (which is just an example of using the blockchain). Blockchain is a distributed digital database, where every transaction is broadcast over the net and stored publicly, making it immutable as well as transparent. Compared to traditional financial and legal systems, this provides increased security, higher efficiency, greater error resistance, and reduced transaction costs. You can read about it in The ultimate 3500-word guide in plain English to understand Blockchain.

Bitcoin was launched for around $US0.005 (ie. half a cent). It was pretty much ignored for 4 years, but it has increased greatly in popularity over the past 4 years. Its exchange rate first exploded to a peak in late 2013, followed by a slow decline of nearly 90% (associated with the collapse of the Mt Gox digital currency exchange). It has achieved near-manic popularity in the past year, as shown in the first graph.

From CoinGecko
Bitcoin exchange rate with the US dollar

So, we now have headlines like this: Bitcoin just surged over $4000 and is near biggest financial crash in 400 years. The reference is to to what is known as Tulip mania, in the Netherlands in 1636-1637, where the tulip bulb prices quickly went from 1 guilder to 60, exploded to 1,000 or more, and then crashed. This is the context within which Bobby Lee made his claim (quoted above) that the current Bitcoin price is not too high.

The important point for our purposes here is that Bitcoin has spawned a host of imitators. So, there are now, or have been, more than 1,000 cryptocurrencies in existence. Many of them are intended as genuine digital currencies, each one addressing one or more of the perceived limitations of the original Bitcoin (such as its inability to scale up to a large number of transactions, or to process transactions faster). Indeed, we may see Bitcoin as a proof of concept and/or pilot study for digital currencies.

Most of the so-called altcoins, however, are not intended as general-use currencies at all. Instead, they form a totally new mode of fundraising for start-up companies, which now sell custom cryptocurrencies in order to raise investment. That is, instead of issuing shares as an IPO (initial public offer) they have an ICO (initial coin offer), thus bypassing the traditional venture capital processes. There is is a whole new world of digital finance emerging (see Cryptocurrency mania fuels hype and fear at venture firms).


In order to assess the comparative price of Bitcoin to the altcoins, I need the exchange rate of the current crop of cryptocurrencies. I took the CoinGecko rates at 14:25 UTC on 11 November 2017 (they change by the minute!). There were 735 coins listed, of which I took the top 100 exchange rates in US dollars. I then ignored the data for the Bit20 coin, which is actually related to an index fund, and thus has a price that is unrelated to the other currencies.

The next graph shows the currencies listed in the rank order of their value. This should illustrate a special case of the Power Law that is known as Zipf's Law, which refers to the "size" of each event relative to it's rank order of size. The standard way to evaluate the Zipf pattern is to plot the data with both axes of the graph converted to logarithms, under which circumstances the data should form a straight line.

As you can see, the exchange rates do fit Zipf's Law very well. In particular, Bitcoin, which is the #1 ranked coin, is not over-priced relative to the other coins. Note that this does not address the question as to whether all of the coins are over-priced or not. That would be a separate question, about the intrinsic value of cryptocurrencies.

Note that the top 25 ranked coins do not fit the Power Law as well as do the remaining 75 coins. So, we might also look at these top coins separately. This is shown in the next graph.

These 25 coins also fit Zipf's Law very well, but the power exponent is clearly smaller than for the remaining coins. In this case, Bitcoin fits the Power Law even better than before. Like it or not, relative to the other coins, Bitcoin is, indeed, not "too expensive".

Very few of the coins appear to be be over-priced (ie. far above the line), but a few of them might be considered under-priced (ie. far below the line). In particular, the #4 ranked coin is the SegWit2x [Futures]. This coin represents a controversial suggestion to split off from Bitcoin. It has not received a great deal of support from the Bitcoin community, and the proposed split was officially suspended only a few days ago. Whether it will go ahead eventually is unclear. The #5 ranked coin is Dash, which is often touted as a currency much more like cash, in the sense that the users can remain almost completely anonymous (which is actually a bit tricky with Bitcoin).

In the world of currency exchange, the big three pieces of information about each currency are (i) the Price of each coin, (ii) the Market Capitalization, which is the total coin supply multiplied by the coin price, and (iii) the Liquidity, which refers to how easy it is to buy and sell coins without causing a change in their price (it is used to measure the market share, market maturity and market acceptance). We could summarize this information for each coin by using a phylogenetic network.

So, I took the information as supplied by CoinGecko (see above) in US dollars, and log-transformed the numbers (economic worth is usually considered to be log-normally distributed). I then calculated the manhattan distances pairwise between the currencies, and plotted this using a NeighborNet graph, as shown in the final figure. The 10 top-price currencies have their full name shown, while the remainder are labeled with their exchange abbreviation. As usual, coins that have similar financial characteristics are near each other in the network; and the further apart the coins are in the network then the more different are their characteristics.

There are basically four neighborhoods in the graph, representing four different types of coins. Those coins at the top-right of the network all have a high Price, Capitalization and Liquidity. These are the coins that currently dominate the market. Moving leftwards from there in the graph, the Price, Capitalization and Liquidity all decrease, so that the coins in the middle of the network have low values of all three criteria. The coins at the top-left of the network have a relatively high Price but still have a low Capitalization and Liquidity. Those coins isolated at the bottom of the network currently have no Market Capitalization at all, even though they are available for trading and thus have a Price (this includes the SegWit2x Futures).


So, should you invest your hard-earned savings in cryptocurrencies? Plenty of people are doing so. For example, Coinbase, the largest cryptocurrency exchange in the USA, reportedly now has 12 million customers.

The general consensus seems to be "yes" to investment only if you like a bit of a gamble, because you may win big, but otherwise the answer is currently "no". The attributes that currently make cryptocurrencies such a speculative investment, such as their big price swings, their volatility and unpredictability, and their potentially lucrative payoffs, actually make them pretty useless as currencies. If you are looking for a long-term investment, then you probably need to find an altcoin that is either useful as a transaction medium, or provides an innovative application of the blockchain technology.

Tuesday, November 7, 2017

PhyloNetworks: a package for phylogenetic networks

Recently, another computer package was released that is of relevance to this blog. This is described in a forthcoming paper:
Claudia Solís-Lemus, Paul Bastide, Cécile Ané (2017) PhyloNetworks: a package for phylogenetic networks. Molecular Biology and Evolution (in press) 12: 3292-3298.
The authors describe the package this way:
PhyloNetworks is a Julia package for the inference, manipulation, visualization and use of phylogenetic networks in an interactive environment. Inference of phylogenetic networks is done with maximum pseudolikelihood from gene trees or multi-locus sequences (SNaQ), with possible bootstrap analysis. PhyloNetworks is the first software providing tools to summarize a set of networks (from a bootstrap or posterior sample) with measures of tree edge support, hybrid edge support, and hybrid node support. Networks can be used for phylogenetic comparative analysis of continuous traits, to estimate ancestral states or do a phylogenetic regression.

The  SNaQ analysis is described in a previous paper:
Solís-Lemus C, Ané C (2016) Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLOS Genetics 12:e 1005896.
The phylogenetic model used incorporates: mutations (as usual), incomplete lineage sorting of alleles in ancestral populations (using the coalescent), and horizontal inheritance of genes (ie. reticulations in the network). The likelihood is decomposed into quartets, which makes the likelihood calculations relatively fast, and also allows the analyses to be scaled up to many species and many genes.

The PhyloNetworks software is open source, and is available with documentation at:
Have fun learning to use the Julia system, which I had never even heard of before investigating this new package!

Note: In spite of the similarity in name, this new package has nothing to do with Luay Nakhleh's PhyloNet package, nor to the Phylogenetic Networks blog.