Friday, November 30, 2012

Description, explanation and prediction in phylogenetics

My recent post on the relationship between phylogenetic trees and networks (Are phylogenetic networks as scientific as trees?) has generated some comment, particularly with regard to the way in which these three phenomena apply to phylogenetics.

By way of explanation, I have included here a specific example each of description, explanation and prediction using phylogenetic trees. They all come from my studies of one particular taxonomic group.


The phylum Apicomplexa (sometimes also known as Sporozoa) forms a large and diverse group of unicellular protists with a wide environmental distribution. They are obligate intracellular parasites, being the only large taxonomic group whose members are entirely parasitic. The phylum is traditionally considered to contain four clearly defined groups: the Coccidians, the Gregarines, the Haemosporidians and the Piroplasmids. The phylogenetic tree shown here (from Morrison 2009) is based on complete 18S rDNA sequences.

This tree is, in one sense, nothing more than a mathematical summary of some of the patterns in the aligned nucleotide data. However, if we accept the idea that this data summary represents the evolutionary history of the organisms (ie. the data summary represents the gene history and the gene history represents the organismal history), then the tree is also a quantitative description of that history.

In this particular example, however, the description is likely to be wrong, in at least some details. For example, it seems improbable that the Haemospordians (Plasmodium and Hepatocystis) are derived from within the Gregarines. This placement is more likely to be the result of long-branch attraction, so that the data summary is in error (as the consequence of a mathematical artefact), which leads to an inaccurate description of the evolutionary history.


Crytosporidium causes cryptosporidiosis in mammals. It has traditionally been classified with the Coccidians (see Ellis et al. 1998), a placement first suggested in 1907, based on features of the life-cycle, the macro- and microgamonts, and the oocysts (see Beĭer 2000). However, drugs that help treat coccidial infections (such as coccidiosis, toxoplasmosis, neosporosis and sarcocystosis in vertebrates) do not work on Cryptosporidium, an observation that has long puzzled parasitologists.

The earliest phylogenetic analyses of 18S rDNA from Apicomplexans called this taxonomic placement into question (Johnson et al. 1990), and this was repeatedly confirmed by later analyses (eg. Morrison & Ellis 1997). However, these analyses did not include representatives of all of the Apicomplexan groups (ie. they sampled only Coccidians, Haemopsoridians and Piroplasmids), and the first analyses to also include the Gregarines (which infect invertebrates) indicated a sister-group relationship (Carreno et al. 1999). This phylogenetic placement of Cryptosporidium as sister to the Gregarines is the currently accepted one (Barta & Thompson 2006, Leander 2007, Morrison 2009).

Thus, the currently accepted phylogeny explains why the anti-coccidial drugs do not work on Cryptosporidium — it is not a Coccidian. The traditional taxonomy does not provide any such explanation.


Taxon sampling has been almost entirely opportunistic within the Apicomplexa, as it almost always is in parasitology. Opportunities for sampling arise principally from studies of medical diseases (eg. malaria, cryptosporidiosis and toxoplasmosis) and of veterinary diseases (eg. coccidiosis, neosporosis and babesiosis). This can create practical problems (eg. in epidemiology), such as when dealing with parasites that have a two-host life cycle but where only one of the hosts is known.

Sarcocystis is part of the Coccidia, causing sarcocystis in vertebrates. It has a two-host (or indirect) life cycle — the definitive host (in which sexual reproduction occurs) is usually a carnivore, while the intermediate host (where asexual reproduction occurs) is usually a herbivore. Sometimes, parasites have been collected only in the intermediate host, and thus we need to predict the definitive host species, in order to direct the search for it. (Importantly, targeted searches use fewer experimental animals.) This prediction can be done using a phylogeny, as the prediction then comes from known hosts for the other parasite species within the same clade (monophyletic group).

The 18S rDNA phylogeny shown here is for part of Sarcocystis (it is taken from Morrison et al. 2004), and it also shows the known host species for each parasite species. This phylogeny can be used to predict that the most likely definitive host for Sarcocystis species V would be the same as the host for the other species in the monophyletic group labelled A, which would thus be a canid. Similarly, the predicted definitive host for Sarcocystis sinensis would be the same as the host for the other species in the monophyletic group labelled B, which is thus probably humans but possibly a felid.

In three cases this form of prediction of the definitive host of Sarcocystis species was tested by subsequent experimental infection studies (Dahlgren & Gjerde 2010; Gjerde & Dahlgren 2010), and the predictions were all confirmed to be correct.


Barta JR, Thompson RCA (2006) What is Cryptosporidium? Reappraising its biology and phylogenetic affinities. Trends in Parasitology 22: 463-468.

Beĭer TV (2000) [Article in Russian, with English abstract.] [Further comment on the coccidian nature of cryptosporidia (Sporozoa: Apicomplexa)]. Parazitologiia  34: 183-195.

Carreno RA, Martin DS, Barta JR (1999) Cryptosporidium is more closely related to the Gregarines than to Coccidia as shown by phylogenetic analysis of Apicomplexan parasites inferred using small-subunit ribosomal RNA gene sequences. Parasitology Research 85: 899-904.

Dahlgren SS, Gjerde B (2010) The red fox (Vulpes vulpes) and the arctic fox (Vulpes lagopus) are definitive hosts of Sarcocystis alces and Sarcocystis hjorti from moose (Alces alces). Parasitology 137: 1547-1557.

Ellis JT, Morrison DA, Jeffries AC (1998) The phylum Apicomplexa: an update on the molecular phylogeny. In GH Coombs, K Vickerman, MA Sleigh, A Warren (eds) Evolutionary Relationships Among Protozoa (Kluwer, Dordrecht) pp. 255-274.

Gjerde B, Dahlgren SS (2010) Corvid birds (Corvidae) act as definitive hosts for Sarcocystis ovalis in moose (Alces alces). Parasitology Research 107: 1445-1453.

Johnson AM, Fielke R, Lumb R, Baverstock PR (1990) Phylogenetic relationships of Cryptosporidium determined by ribosomal RNA sequence comparison. International Journal for Parasitology 20: 141-147.

Leander BS (2007) Marine Gregarines: evolutionary prelude to the Apicomplexan radiation? Trends in Parasitology 24: 60-67.

Morrison DA (2009) Evolution of the Apicomplexa: where are we now? Trends in Parasitology 25: 375-382.

Morrison DA, Bornstein S, Thebo P, Wernery U, Kinne J, Mattsson JG (2004) The current status of the small subunit rRNA phylogeny of the Coccidia (Sporozoa). International Journal for Parasitology 34: 501-514.

Morrison DA, Ellis JT (1997) Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of Apicomplexa. Molecular Biology and Evolution 14: 428-441.


  1. Thanks for explaining David. I can now see what you mean by phylogenetic prediction.

    The interesting question you raised in the first post was: What if the prediction fails?

    Suppose the tree was not deduced from 18S rDNA but some other strech of DNA and S. sinensis happened to somehow end up in taxon A. Then the prediction would have been a canid definitive host and that would have turned out to be false.

    My hunch is that you wouldn't therefore reject the idea that small chunks of DNA have phylogenies with tree patterns.

    What of the idea that phylogenies of many chunks of DNA show a network pattern - Would a prediction as above still be possible? Would we reject the idea of networks if it was not (or at least consider network phylogenies less scientific)?

  2. Joachim, I don't think that I would reject the idea of a tree or a network just because a prediction fails. Indeed, as Jonathan Losos has noted (cited in the previous post), many predictions from phylogenetic trees will fail for very good reasons, even when the organism tree is known. Nor would I reject trees or networks just because predictions are not possible under some circumstances.

    My own interest is in whether we can expect network predictions to be better / worse / as good as those from trees — that is, the relationships between networks and trees. My conclusion, so far, is that we may need to know more about the network than about the tree in order to make a prediction (ie. we might need to know whether to base the prediction on the vertical evolution or the horizontal evolution component). This weakens their practicality somewhat, but is certainly no reason to forgo networks.

  3. Though I am a layman on the whole issue, that was my initial hunch too. The exercise seems to become more expertly (require more training and sophistication) and therefore become more not less scientific.

    Anyway, I'm a bit sceptic about some forms of 'obsession' with prediction as you can gather from my last post at my blog.