Monday, March 3, 2014

Has phylogenetics reached its apogee?


Few people had heard of phylogenetics before 1970. It was during that decade that explicit methods for constructing phylogenetic trees came to prominence, although such methods had first appeared in the late 1950s. These methods appeared first in systematics, based on parsimony (1970s), and then in genetics, based on likelihood (1980s). These days, phylogenetics is seen as ubiquitous in biology, but it is interesting to consider whether this idea can be quantified.

Joseph Hughes (2011.TreeRipper web application: towards a fully automated optical tree recognition software. BMC Bioinformatics 12:178) had a go at this by trying to extract information from the PubMed bibliographic database. Here, I have expanded on this approach.

I searched PubMed for the string phylogen*, thus including words like "phylogeny" and "phylogenetics", as well as unusual variations on these words. I searched both the full bibliographic record (including the abstract) as well as restricting the search to the Title field. I did this for every calendar year from 1970–2012 inclusive (the 2013 data are currently still incomplete in the database).


The results are shown in the first graph, and the second graph shows the details of the title search alone. The data are expressed as a percentage of the total number of PubMed records for each year.


So, less than 2% of the current papers in biology mention phylogenetics in their title or abstracts. This does not, of course, mean that the paper doesn't mention the topic at all, as it could do so under some other name (eg. "evolutionary tree", "genealogy", etc), or do so in a way that does not make it into the abstract. Still, it seems to me that this is a rather low number.

The erratic nature of the data before 1975 is probably a by-product of the quality of the PubMed data for that time. However, the clear upper asymptote in the data this century is not artifactual, but real. The average maximum value for the "All" data is ~1.54%, reached in 2009, while the average for "Title only" is ~0.17%, reached in 2004. This seems to imply that phylogenetics has now saturated the market, and is as ubiquitous as it will be, unless something new comes along to change it.

The initial rise in usage of the phylogenetic methods coincided with the release of computer programs that implemented them. Wagner78 was released for mainframe computers in 1978, followed by Phylip in 1980. Phylip was the first to be ported to microcomputers; but it was the release of the PC version of PAUP (v. 2.4) in December 1985 that came to dominate the next 10 years. Hennig86, the successor to Wagner78, was released in 1988.

However, the rapid growth in usage coincided with the growth of molecular genetics. The patent applications for PCR were filed in 1985, and the first paper based on it was also published that year. The technology started to be used for human diagnostics during 1986, and PCR became a basic research tool in molecular biology from c.1989. (Science selected PCR as the major scientific development of 1989.) The journal Molecular Biology and Evolution was founded in 1983, and Molecular Phylogenetics and Evolution in 1992.

The inflection point in the graph is c.1999, which indicates where the slow-down in growth occurred. Coincidentally, it was in 1999 that the Journal of Molecular Evolution announced that it would henceforth exclude molecular phylogenetics (and research on the origin of life), except in cases that have "a special significance and impact." Phylogenetics was now seen as a tool of evolutionary analysis rather than an end in itself.

By this stage, bayesian methods were being proposed, and MrBayes was released in 2001, rapidly becoming the predominant program. However, this was simply a transformation of the existing methodology, rather than being a major new component of data analysis in the way the very first programs were. Furthermore, the rise in usage of genome data seems also to be a transformation, rather than a major addition to data collection the way sequence data were.

Thus, it took 30 years (c. 1978–2008) for the phylogenetics revolution to be complete. Mind you, it had already taken 150 years from 1859 for quantitative methods to first be proposed.

No comments:

Post a Comment