Wednesday, October 21, 2015

Studying gene flow using genomes

Continuing the recent blog theme of researchers analyzing potentially reticulate relationships without explicitly using networks (Are networks actually used to explore reticulate histories? ; Problems with manually constructing networks), there is this just-published paper:
Nater A, Burri R, Kawakami T, Smeds L, Ellegren H (2015) Resolving evolutionary relationships in closely related species with whole-genome sequencing data. Systematic Biology 64: 1000-1017.
The authors note:
Using genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. However, reconstructing the sequence of speciation events, the so-called species tree, in closely related and potentially hybridizing species is very challenging. Processes such as incomplete lineage sorting and interspecific gene flow result in local gene genealogies that differ in their topology from the species tree, and analyses of few loci with a single sequence per species are likely to produce conflicting or even misleading results ... Although gene tree incongruences caused by ILS are still fully compatible with a strictly bifurcating species tree, gene flow among species requires a more complex representation of evolutionary histories, resembling reticulate networks rather than trees.
Unfortunately, this is the sole mention of the word "network" in the text.

The authors addressed the issues of incomplete lineage sorting and interspecific gene flow using whole-genome sequence data from 198 individuals of four flycatcher species, plus two outgroup genomes. They found that, for most genomic regions, none of the 15 possible rooted gene tree topologies appeared consistently at high frequencies — the most frequent gene tree occurred 17.7% of the time, with the second at 14.3% and the third at 10.5%.

They investigated this gene-tree diversity using four programs that attempt to resolve a species tree in the context of incomplete lineage sorting and the coalescent: MP-EST, SNAPP, Fastsimcoal2, and ABC. The latter two approaches also allow for post-divergence gene flow. All four methods have limited applicability when applied to 200 genomes, and so in each case only a subset of the data was analyzed or a subset of the possible species trees was tested. All four methods produced the same species tree, which was also the same as the most commonly encountered gene tree.

Unfortunately, the authors found almost no evidence of gene flow using these methods, although their detailed gene-tree analyses do suggest its existence. This indicates that there are problems with these methods. Perhaps the main problem is that the authors approached their analyses almost exclusively in the context of a species tree rather than a network. There are other methods that one could try, including the one used by researchers studying introgression in archaic hominoids (as discussed in Are networks actually used to explore reticulate histories?).

In addition, the authors seem to be unclear about their concept of what is a species. For example, they note that "gene flow among lineages in the species tree can confound the true order of speciation events", which seems to preclude use of the biological species concept. Furthermore, they note that "lack of species monophyly is common in this study system", which seems to preclude the phylogenetic species concept. What then constitutes speciation?

Finally, the authors seem to have a common misconception of ancestral character states. Their approach includes this statement: "If both outgroup individuals were monomorphic for the same allele, this allele was considered ancestral." This argument has been repeatedly rejected in the literature. See, for example, Crisp MD, Cook LG. (2005) Do early branching lineages signify ancestral traits? Trends in Ecology and Evolution 20: 122-128.

No comments:

Post a Comment