Monday, May 5, 2014

Word cloud of publications on phylogenetic networks

There are a number of ways available to analyze word frequency and usage in a block of text, and to display the result as a diagram. Word clouds, for example, use font size in the diagram to represent word frequency in the text.

I thought that it would be interesting to look at the recurring words in the abstracts of papers about phylogenetic networks. So, I used the search phrase "phylogenetic network" to obtain the relevant abstracts indexed in PubMed, using the ebot server to produce a perl script that performed the search, and then used the Wordle server to generate a word cloud. The search produced 1,285 publications, from which I deleted the irrelevant PubMed-produced information before producing the diagram. (I also deleted common words in authors' addresses.)

Many of the words are not very revealing about the subject. Nevertheless, I conclude from this that more of the papers involve gene sequences of different species, while population studies less often refer to their display diagrams as "phylogenetic networks", particularly for studies of mitochondrial data (they probably use "haplotype network", instead). Other than that, the only organism in the diagram is "human". "Phylogeny", "tree/s" and "clade" all make an appearance, revealing the strong link to the past.

