Wednesday, June 12, 2013

Cophylogenetic networks

I want to talk about relationships between phylogenetic networks, with respect to the cophylogeny problem.  This is a problem in which phylogenetic histories, which are typically trees, share some ecological link, which may be quite strict or relatively weak, such that their evolutionary dynamics are not independent.  Given that nothing evolves in isolation, that there are more parasites than non-parasites, and that there are many genes – each of which can have its own phylogenetic history – for each organism that houses them, this is a ubiquitous scenario and one worthy of analysis.

A simple case serves to illustrate here:

The tree on the left is labelled H for the "host" phylogeny but it could equally well be named the "species" phylogeny; the tree on the right is labelled P for the "parasite" or "pathogen" phylogeny but could also be thought of as the "gene" phylogeny.  Reconciling these two, given the observed associations at the tips, is a computationally horrible problem (Libeskind-Hadas & Charleston 2010, Ovadia et al. 2011) already.  The proof Ran Libeskind-and I did (he mostly did!) first allowed the host phylogeny to be a network but the later proof by Ovadia et al. didn't.  But it's an interesting problem, because there are hybridization events of host species though not so much of individual genes.

The cophylogeny problem is best attacked as a mapping problem (yes, this is my opinion: there are others).  Given P, H and leaf associations phi, what is the minimal cost mapping of P into H that preserves phi and the structure of P and H, and which is interpretable in a well-defined way? (see Cophylogeny blog for more details.)

We typically have four event types:

codivergence : a generalisation of cospeciation, where a node in P bifurcates at the same moment as does its host node in H;

duplication : where the parasite bifurcates without a corresponding host bifurcation, such as for a gene duplication;

host switch : a parasite/pathogen establishes on a new host lineage); and
loss : we fail to see a parasite/pathogen where we expected it, caused by "missing the boat" / lineage sorting...

extinction  (above) or sampling failure.
We could in principle extend this to deal with failure-to-codiverge events:

... but these cause new headaches.

Mike Steel once very usefully asked me, what are the desired properties of cophylogeny maps?  In trying my best to answer him I realised that none of the properties really needs either phylogeny to be a tree. Nodes in P are mapped to nodes or edges in H, and the evolutionary history they imply is based on the route through H from parent node p' to child node p in P.  If H is a tree then this is unique; otherwise there can be ambiguity and a potential explosion in number of solutions.  But it does mean that potentially we can solve the mapping problem moderately well so long as P and H are at least DAGs.  While this is possible in principle, in practice, it's pretty much impossible.
It's hard enough just with trees:

Figure from Ramsden et al. 2009 Supp. material

The figure above was cut from the manuscript and relegated to supplementary material because it was kind of unnecessary, but it's very pretty so it should see the light of day.  It shows a consensus of 15 solutions / maps we found that could best explain the relationships between hosts and circoviruses; thicker lines for more frequently occurring components and different colours for different groups of maps.

The cophylogeny problem is a fun one that presents lots of modelling, computational and representational challenges.

No comments:

Post a Comment